Yahoo, LII, etc.

Wed Nov 19 21:08:26 EST 1997

At 07:52 AM 97/11/19 -0800, Karen G. Schneider wrote:
>...  We keep saying in
>our profession that we can't possibly have one catalog for the Internet...
>we can't possibly organize the Internet... yada yada yada... but  how do we
>know that?  ...   Why
>haven't we started with the proposition that this was doable?  Why is it
>that we accept a mammoth union database like OCLC but can't project this to
>the online environment?  Why haven't we done our OWN Yahoo, sans the
>McInternet stuff?
>...
>dreaming away in the Northeast... there are over 150,000 librarians in the
>U.S.; we can't do this?

Karen raises some good questions, and reasonable suggestions ...but I think
(if I've understood her argument) that they may be the wrong questions.

This discussion began with "what's wrong with Yahoo?" (or any of the net
search engines, for that matter). As I see it, what's wrong primarily is a
lack of selection ... the typical search bot just goes out and indexes
everything it finds. Does a library collect everything that's printed (or
even published) -- of course not. But the search engines do. [Yahoo at
least claimed to make some effort at selection, but ...]  Add to that the
fact that search engines typically convey little useful information about
their indexed pages, and you have the current situation: basically, if
these were libraries, you'd say their cataloguing was bad and their
collection practices were indiscriminate.

Now, long before the Internet, libraries were making a pretty good job of
cataloguing and indexing published works. With the advent of computers,
libraries were able to develop some pretty nifty systems for cataloguing,
organising, indexing and retrieving information about their collections.

Comes the Internet, then, and suddenly libraries are lost. First we ignored
it. Then we abdicated our role to the likes of Lycos and Yahoo. Some of us
panicked at predictions of the death of the library. But slowly, slowly, we
seem to be coming to grips with the thing, with LC introducing the 856 MARC
field, the PURL project and work on METADATA. 

The other intersting recent development is in Web-interfaced catalogues
(e.g. WebPAC), which makes it finally possible to make use of a traditional
library catalogue as an indexing tool for Web pages. By including the URL
information (via the 856 field), a Web catalogue displays the URL as a link
that can take the user directly to the resource.

So pretty soon now, I predict a large number of librarians are going to
wake up and realise that all we have to do to index the Internet is to use
our catalogues: determine which resources are of worth to our customers,
and create catalogue records for them.

Now of course, "indexing the Internet" is the wrong way to put it, if not
impossible. But indexing (or cataloguing) the parts of the Internet which
are desirable (by whatever collection criteria we wish to determine) is
certainly possible, and most importantly, possible WITH EXISTING PRACTISE.
In other words, libraries just continue to do what they've always done,
with the only difference being that these things are electronic rather than
paper or other 'traditional' media.

To me, it seems that there are only two obstacles to overcome here:

1. The 'startup' effort required is huge, even with good selection. I guess
this can be solved by grant money (imagine for a moment if the money spent
on Yahoo, Alta Vista etc had gone to libraries instead!), and by the usual
cooperative effort that libraries excel at.

2. Maintaing currency is a problem, on a scale not previously faced with
traditional library collections. If you have a book in the collection
today, you should expect to have it many years hence -- barring theft of
course. But adding Internet resources to your Catalogue is different, to
the extent that you're cataloguing something that you don't physically
have, so its currency is out of your control. Good selection will overcome
this problem to some extent, because sites worth 'collecting' are more
likely to stick around. Initiatives like PURL are going to be critically
important here though, since with the best will in the world a Web master
cannot guarantee that a URL will never change. But maintaining URLs is not
impossible, or even difficult if done right.

Finally, and thank you for reading this far, a word about METADATA. Worthy
as this idea is, I don't see this as a final answer to the problem (or even
close, actually). While some page authors will go to the trouble of adding
metadata to their pages, many will not. Are we going to limit cataloguing
activity to those pages with metadata? I don't think so, therefore it
really becomes a side issue. Additionally, even if there is metadata, are
cataloguers likely to accept it at face value? Not the ones I know! E.g. if
there's a subject heading, is it LC, or MeSH, or ... what? If its claimed
to be LC, is it valid? So the use of metadata is interesting but not the
main issue.

To conclude then: we don't need another Web search tool. We have the search
tools already in our library catalogues. What we need are the records.

Thanks for listening.

Steve

 ___________________________________________________________________________
 Stephen Thomas, Senior Systems Analyst
 Mail : Barr Smith Library, The University of Adelaide, South Australia 5005
 Phone: (08) 8303 5190                                   Fax: (08) 8303 4369
 Email: sthomas at library.adelaide.edu.au
 URL  : http://library.adelaide.edu.au/ual/staff/sthomas.html
 ** Unless otherwise stated, the content of this message reflects only my **
 ** own opinion, and not the policy of the University of Adelaide Library.**

 "I must Create a System, or be enslav'd by another Man's" -- William Blake