Organizing Web Information

Wed Jul 17 13:02:40 EDT 1996

Howard Pasternack writes:

>Web pages are not books.  They are most akin to individual documents found
>in manuscript collections.

That's a pretty big generalization. Some web documents are (like) books.
Some are data sets, brochures, articles, manuscripts, directories, resumes,
flyers, candy wrappers... Only a subset of these are going to be useful to
anyone other than anthropologists and the authors' families.

There seems to be an implicit assumption in this thread that web documents
form some kind of natural class. Well, maybe they do, but only in the sense
that 'paper documents' are also a class. I haven't heard of any
institution, library or otherwise, that has attempted to collect, catalog,
or index all paper documents regardless of content. What we do is select -
libraries have generally collected more books than flyers or brochures,
because we consider that the *content* of them is more useful to our users.
Someone, I forget who, seemed to imply that it was elitist to discriminate
between web documents based on content. But if I'm doing research in
Linguistics, and I'm searching an index for resources in Linguistics, I
probably don't want to get hundreds of Linguistics grad students' home
pages in my result set along with the language corpora and articles.

In short, I don't see why the mere existence of millions of web pages is
any more relevant to the cataloging/indexing problem than the existence of
millions of flyers and brochures is.

PG
_______________________________
Peter C. Gorman
University of Wisconsin
General Library System
Automation Services
pcgorman at facstaff.wisc.edu