Alta Vista coverage much smaller than thought?

Prentiss Riddle riddle at is.rice.edu
Thu Mar 27 10:01:55 EST 1997


I'm curious what readers of Web4Lib think of John Pike's discovery that
Alta Vista retains only a small sample of web pages from large web
sites.  He says that Alta Vista has only about 600 pages from his
6000-page site, and admits to having only 300 pages from the
300000-member Geocities ISP!

You can find his message on the subject at:

        http://www5.zdnet.com/anchordesk/talkback/talkback_11638.html

And further analysis is at:

        http://www.melee.com/mica/index.html

The MICA site only compares Alta Vista and Hotbot, with results highly
critical of Alta Vista.  I think a more informative result would come
from comparing other web indexes as well (I'd be particularly interested
in analysis of Infoseek).  I see some other problems with the MICA
analysis as well (chiefly that it relies on the index sites' own
reported estimates of the number of hits they return).  Still, the
results are plausible, given Alta Vista's admission as cited by Pike.

I'm not surprised that there are holes in web indexes -- I routinely do
parallel searches in Alta Vista and Infoseek since I can count on
neither to be complete -- but I'm surprised that Alta Vista's coverage
would peak out so low.  Have others corroborated Pike's and MICA's
findings?  Comments?

-- Prentiss Riddle ("aprendiz de todo, maestro de nada") riddle at rice.edu
-- RiceInfo Administrator, Rice University / http://is.rice.edu/~riddle
-- Home office: 2002-A Guadalupe St. #285, Austin, TX 78705 / 512-323-0708


More information about the Web4lib mailing list