Not my fault
When visitors check the
Webometrics Ranking of Universities, sometimes the institution they are seeking is not in the Catalogue or its classification is clearly below expected. There are several technical reasons for these situations:
Universities without independent domain name. The ranking and catalogue only include universities with their own institutional domain. More than 200 institutions worldwide are still publishing their pages in directories under the domain of their internet hosting company or other shared domain. Many African universities (Nigerian federal ones) prevent us of further WR calculation due to this limitation.
Universities with multiple institutional domains. Most universities have extra domains unrelated to main institutional one, usually devoted for specific and small impact projects or spin-offs organizations. However there some striking situations where two equally valid names are used for the same university or a large part of the sites are under another (older?) domain. It is not feasible to combine the data for the different domains and the global WR of the institution does not reflect its real impact. Two good examples are Imperial College (
ic.ac.uk, imperial.ac.uk) and St. Petersburg University (
spbu.ru, pu.ru).
Changes in the institutional domain. The use of a new domain imposes an important penalty in the calculation of several web indicators. Recent split of the Jussieu campus (jussieu.fr) of the University of Paris (now
.upmc.fr, .univ-paris7.fr and others) can be cited. Unfortunately this is fairly common situation that affects even to universities in the Top 100, but a worrisome example is changes in the name of academic subdomains of many Indian universities, that could explain their delayed positions.
Use of shared domains. Different universities and research centres share the same or similar domains in at least two groups of French Universities: Marseille (almost fixed) and Strasbourg. This situation is a bit messy and web rankings of these institutions are not reflecting their true position. A similar situation could be applied to Helsinki University which on the contrary has an overvalued rank because the city also use
helsinki.fi domain!. There is no unique domain for the Universidad de la Republica of Uruguay, so each Faculty has its own institutional domain.
Invoked on the Web
The
citation analysis is not the only way to analyze the bibliographic characteristics of a paper although it is a key method for bibliometric studies. In a similar way, some proposals intend to restrict the webometric studies to the link analysis and specifically the sitation analysis, the formal links between electronic papers. Obviously, there are many other possibilities to exploit web data, including informal references that can benefited from the large sample size of the webspace.
Cronin et al. (
Invoked on the Web. Journal of the American Society for Information Science, 49 (14): 1319-1328) in 1998 proposed to analyse the number of times a researcher’s name appeared cited in web pages. This is equally valid for a title of a paper (introduced formally by
Liwen Vaughan and used by
Hildrun Kretschmer too, the name of an institution, or selected terms or phrases (You can check some of the papers by
Judit Bar-Ilan).
From a methodological point of view, and taking into account that search engines do not cover all the Web and there important bias in their results, invocation can be calculated easily using quotation marks around the name in the search engines. The result is referred as the number of times this name is cited in the Web. Some authors call it Web visibility, although we prefer to reserve this word for link visibility. This indicator usually favours large, well-known, old institutions independently of their real effort for having a relevant Web presence.
Some Peruvian universities were chosen to compare several webometric indicators. Ranked according to the invocation, there are some placements that are not correlated with those obtained with the other indicators.

Although invocation measures can be interesting for some analysis, a cautionary use is recommended as it is not possible to assign a unique, unambiguous universal name for every institution.
Short list of search engines
There are two preliminary criteria to be satisfied by webometrics friendly search engines:
1. To have a large and independent self crawled database
2. The recovery system will allow the filtering of results according to url-related delimiters
Taking into account these requirements, currently only six engines are useful for quantitative analysis purposes:
-
Google (
www.google.com)
-
Yahoo Search (
search.yahoo.com)
-
MSN Search (
search.msn.com)
-
Teoma (
www.teoma.com)
-
Gigablast (
www.gigablast.com)
-
Exalead (
www.exalead.com)
Google has a database size probably exceeding 11 bn pages, including a good coverage of the so-called rich files and other dynamic and special filetypes. On the negative side, the number of dead-links has increased a lot in this engine. Individual records of large databases (e.g.: PubMed) are also indexed, but without covering the full size strangely.
Current figures for Yahoo are not available as its databases have increased greatly during July. An educated guess is around the 10-12 bn mark. Actual figures provided by the answers to a large number of request are misleading as the number provided decreases when exploring further pages than first answers' one.
MSN Search looks to have the most comprehensive geographical coverage, including Asian regions usually not well indexed by the other major engines.
Gigablast and Exalead still have very small databases. However, as the overlap among engines is so low, combined search using several engines is clearly an option.