[Web4lib] Google limit of 1,000 results - end of the web?
ev at buch.biblio.etc.tu-bs.de
Mon Jul 18 08:16:27 EDT 2005
Lars Aronsson asked:
> Can we define the end of the web, i.e. can we have knowledge about
> every webpage that exists?
In principle and in theory, yes. Practically, however, no one has the
storage space nor the bandwidth or processing power to crawl into all
the remotest corners and look at "everything" to find if it's relevant.
And if that wasn't the case, then we'd lack the time, almost always, to
look at everything that *is* probably relevant. And the odds are
deteriorating all the time that this might change - for stuff explodes
all the time all around us while our time's running out. So, not being
able to know "every webpage that exists" means we can never be sure
to have found all the best stuff even if we see all the results any one
gatherer has gathered. Just as no one library has all the best books on
any one topic.
Google's one big invention of using the syndetic structure provided by
web links was only a re-invention of what the Science Citation Index had
long since been doing for journal articles. The SCI, however, also has
keyword and name indexes, and result sets from these are not presented
in ranking order by citation counts with no other option, and their
counts are exact.
8 bn documents indexed - that certainly dwarfs library collections and
catalogs. But no one knows how much they are missing or if they might be
capable of handling "everything". And yet, however large today's count
of "everything" might be, it must still be a very long way from what a
"google" actually used to mean: a number with 100 digits...
"Hype", thus, is not exactly the appropriate word for what they claim to
be doing. They're awfully good at what they do, don't get me wrong, but
they don't do really much, by their own yardstick and others. It only
happens to be what most people think they need most of the time.
More information about the Web4lib