Friday, August 12, 2005

Short list of search engines

There are two preliminary criteria to be satisfied by webometrics friendly search engines:

1. To have a large and independent self crawled database
2. The recovery system will allow the filtering of results according to url-related delimiters

Taking into account these requirements, currently only six engines are useful for quantitative analysis purposes:

- Google (www.google.com)
- Yahoo Search (search.yahoo.com)
- MSN Search (search.msn.com)
- Teoma (www.teoma.com)
- Gigablast (www.gigablast.com)
- Exalead (www.exalead.com)

Google has a database size probably exceeding 11 bn pages, including a good coverage of the so-called rich files and other dynamic and special filetypes. On the negative side, the number of dead-links has increased a lot in this engine. Individual records of large databases (e.g.: PubMed) are also indexed, but without covering the full size strangely.

Current figures for Yahoo are not available as its databases have increased greatly during July. An educated guess is around the 10-12 bn mark. Actual figures provided by the answers to a large number of request are misleading as the number provided decreases when exploring further pages than first answers' one.

MSN Search looks to have the most comprehensive geographical coverage, including Asian regions usually not well indexed by the other major engines.

Gigablast and Exalead still have very small databases. However, as the overlap among engines is so low, combined search using several engines is clearly an option.

2 Comments:

At 1:20 am, Anonymous Anonymous said...

I really can't understand why you use exalead when it has a very limited number of sites and pages! When you exclude its size (for every univesity) then this means it should not have been used. Thus, it is not a good idea to use it as one of two players for backlinks (visibility).

 
At 11:52 pm, Blogger Unknown said...

I am suprised to see the california Institute of Technology ranked very low while Massachuett Institute Technology is ranked # 1. There must be some sort of mistake or overlooked data. Georgia Institue and several less prestigious universities are ranked above it. Examine your data

John Awunganyi

 

Post a Comment

<< Home