[Web4lib] Google limit of 1,000 results - end of the web?

bernhard Eversberg ev at buch.biblio.etc.tu-bs.de
Mon Jul 18 08:16:27 EDT 2005

Lars Aronsson asked:
> Can we define the end of the web, i.e. can we have knowledge about 
> every webpage that exists?  

In principle and in theory, yes. Practically, however, no one has the 
storage space nor the bandwidth or processing power to crawl into all 
the remotest corners and look at "everything" to find if it's relevant. 
And if that wasn't the case, then we'd lack the time, almost always, to 
look at everything that *is* probably relevant. And the odds are 
deteriorating all the time that this might change - for stuff explodes 
all the time all around us while our time's running out. So, not being 
able to know "every webpage that exists" means we can never be sure
to have found all the best stuff even if we see all the results any one
gatherer has gathered. Just as no one library has all the best books on 
any one topic.
Google's one big invention of using the syndetic structure provided by 
web links was only a re-invention of what the Science Citation Index had 
long since been doing for journal articles. The SCI, however, also has 
keyword and name indexes, and result sets from these are not presented
in ranking order by citation counts with no other option, and their 
counts are exact.

8 bn documents indexed - that certainly dwarfs library collections and 
catalogs. But no one knows how much they are missing or if they might be 
capable of handling "everything". And yet, however large today's count 
of "everything" might be, it must still be a very long way from what a 
"google" actually used to mean: a number with 100 digits...
"Hype", thus, is not exactly the appropriate word for what they claim to 
be doing. They're awfully good at what they do, don't get me wrong, but 
they don't do really much, by their own yardstick and others. It only 
happens to be what most people think they need most of the time.

Bernhard Eversberg
Universitaetsbibliothek Braunschweig

More information about the Web4lib mailing list