Organizing Web Information

Marc Salomon marc at ckm.ucsf.edu
Wed Jul 17 15:34:12 EDT 1996


<LCOHEN at cnsvax.albany.edu>
|One point that I think should be added to this discussion is the fact that
|there is rarely such a thing as a static Web page. Many Web pages evolve as a
|matter of course, and in fact can be expected to change in character after the
|time they will have been "cataloged".

Let's control our own vocabulary here.  Discussing web resources at the
granularity of a web page is not very useful and doesn't allow for a rich
definition of composite resources.

Composite stuctured resources available on the web can/will consist of various
components arranged according to the task at hand.  Yes, it is easier to alter
the contents of web resources, but not all web resources will be altered.  The
concept of the static monograph and periodical has already evolved onto the
web, and coexist with applications that generate each custom page on-the-fly.

|There are also certain records for which a controlled vocabulary is less than
|useful--in a search for a proper name, or a unique scientific term, for
|example.

This is the argument for cataloging rules and smart search interfaces.  I just
had the hardest time searching the Medline for an article by a Dr. Rachel
Gittleman-Klein cited in a recent monograph.  It took me more than a day,
seeking help from a librarian, to find that damn hyphenated name even though it
was encoded by the NLM according to its rules.

|I see the Internet as one (currently) not very large  computer database, and I
|think it would be helpful  to look at it more in terms of search engine power
|rather than the cataloging of a slippery, ever-changing mass of records.

Brian Behlendorf has called the www the world wide hash-table (a hash table is
a data structure that boils down each key into a relatively small integer which
points to a list of other resources whose keys boiled down to the same number).

The set of objects addressable over the WWW is anything but small, and the task
of walking the tree to update a search engine is itself imperfect as evidenced
by the number of bad links returned as results by a search engine.

-marc

-- 


More information about the Web4lib mailing list