Organizing Web Information

Howard Pasternack BLIPS15 at BROWNVM.brown.edu
Wed Jul 17 09:47:13 EDT 1996


I guess I would like to go back to basics and question whether it is
realistic to expect any controlled vocabulary, whether it be LCSH or
some other indexing scheme, to be able to deal with something as large and
as dynamic as the web.

All of the indexing vocabularies and thesauri currently in use are under the
purview of an agency which is also responsible for reviewing the literature
and adding new terms based on the vocabulary used in the literature.
This goes for LSCH, Mesh, Engineering Index, etc., etc.   Leaving aside
the efficacy of these indexing vocabularies, the web indexing schemes,
whether it is done by librarians or by authors, are based on the premise
of distributed indexing across a large number of institutions/individuals.
So, my question would be how is the vocabulary supposed to be supported.
If a new technical term appears in the literature, then is one supposed to
wait nine months for the powers that be to bless the term before it is
assigned to a web page. And if the vocabulary is uncontrolled, then the
results are not much different from automatic indexing.

Alta Vista and the others may need additional work on the retrieval algorithms
and on the relevancy rankings, but I have yet to see any research that they
do worse than LCSH when it is applied to a million+ record database.

Web pages are not books.  They are most akin to individual documents found
in manuscript collections.  For the most part we do not index the individual
manuscript pages because we can't.  We're lucky if we can index at the
collection level.  So, maybe librarians should focus on identifying useful
sites and leave the indexing of the pages to automatic schemes like
Alta Vista.

Howard Pasternack
Brown University


More information about the Web4lib mailing list