Cataloging the Internet

Mon Oct 16 20:28:15 EDT 1995

>>> Uh huh, kindly let us know when the staff of Kansas City Public Library 
	have
>>> finished cataloging, classifying, and indexing the Internet :-)

>> OK, just as soon as you let us know when you've finished cataloging, 
>> classifying and indexing all the books in the world.  If there's a 
	fundamental
>> difference between these two tasks then I'd be interested in knowing about 
	it.

> So what is the difference? Can you catalog a moving target or do you have 
> to pattern recognize it? 

All cataloged resources are moving targets.  It's just that some move faster
than others.  How this cataloging is actually done seems to me to be a point 
of expediency more than anything else.

>                          Wouldn't it be more reasonable to imagine an 
> uncataloged internet which includes cataloged resources such as libraries?

This is my point.  The print medium isn't cataloged either- it's just been 
made more manageable over time and with a lot of effort.  This is going to 
happen to the Internet one way or another, and it strikes me that libraries, 
of all concerned institutions, are best equipped to bring this about in an 
orderly way, particularly as much of the apparatus needed to do so is already
in place.  

> One of the advantages of libraries is not that they include everything 
> but that they exclude most things. The internet on the other hand seems 
> to include everything.

I think there's an assumption at work among librarians that any attempt at 
bibliographic control of the Internet must be comprehensive or attempt to be
such.  This is, of course, ridiculous.  No library attempts this sort of thing.
Libraries collect resources of interest to themselves, often sharing them with 
other libraries to spread out costs, etc.  Comprehensive bibliographic control 
of ANY publication medium, including those already familiar to us, is an 
illusion, albiet a necessary one.

However Yahoo or Lycos or any intelligent-agent effort to automatically render 
the Internet comprehensible may tempt us, any kind of real bibliographic
control of the Internet will more likely come about as libraries (and others)
select (ie. establish and maintain links to) the Internet resources of interest 
to them, and find ways to share the resources they thus create in a workable
fashion.  The underlying assumptions I am making are: 1) this can best be 
achieved by extending the existing principles, apparatus and methodology of 
librarianship to the Internet, 2) this approach will produce more effective 
bibliographic control of the Internet than any other.  In support of these, 
I'd argue that the measure of bibliographic control we've achieved thus far
with print and other materials seems to work pretty well, and can indeed be 
achieved for Internet resources if a fair number of libraries put their minds 
to it.  Bear in mind that those big MARC databases at OCLC and elsewhere had 
to come from somewhere...  and once again, I don't think the Internet is so
different from any other publication medium so as to render this task
impossible.  The big leap seems to be that it _is_ a medium at all.

Paul Neff
Internet Librarian, Kansas City Public Library
paul at kcpl.lib.mo.us

PS:
A good role for existing Internet indexes in this sort of approach might be 
as finding aids for selection and intelligent assistants for catalogers.  
While I'm skeptical of the ability of any parser, indexer or intelligent agent 
to determine the subject matter of a resource all by itself, I'd guess that 
the indexes these generate could help catalogers decide this more easily.
(Not to say I wouldn't like to see more original catalogers employed. :-)