Serendipity, re: organizing schemes

Mon May 3 11:25:24 EDT 1999

At 07:25 AM 5/3/99 -0700, Minkel, Walter (Cahners -NYC) wrote:
>I have a devil's advocate question in response to the request below: why is
>it a good thing to sort Web collections by Dewey or LC? Dewey or LC are
>useful for items that cannot be searched by keywords, but why is it
>necessary, other than the fact that the books have call numbers, to sort Web
>sites that way? Yeah, I know that Web sites are being assigned call numbers
>in many library catalogs, but is it necessary if keywording & assigning
>subject headings is done carefully? In a public library situation (where my
>experience lies) call numbers were a way to arrange things together on the
>shelf & that was about it. But Web sites have no "shelf" to sit on.

While call numbers themselves may disappear, the idea of grouping related 
materials makes more sense than ever, IMO.  This is the only way that 
people will find things serendipitously.  With so much information on-line, 
most searches on content has limited value, because a search on words will 
return too many items most of the time.  Categories provide context, which 
complements content search in at least two ways.  By returning categories 
as the result of a search, an enormous list of results becomes manageable, 
allowing broad searching.  By searching within categories, the scope can be 
narrowed to the point where the results of word searches return reasonable 
lists individual items.  Digging through a huge collection of resources 
becomes much easier when you can choose at any point to switch between 
searching content (text search) and navigating context (categories).  This 
was the idea behind the Verity Knowledge Organizer (for which I was the 
product manager) -- a hybrid of something like Yahoo and something like 
AltaVista, but integrated much more than any other product or service that 
I know of.

There is a third benefit that is just beginning to emerge, though I suspect 
in the long run it's the one that will really change information search and 
retrieval.  Unlike an organizing system implemented in the physical world, 
a digital system can support multiple, overlapping classification 
schemes.  This allows the researcher to ask a new kind of question -- how 
does a given resource or category look from other contexts, other points of 
view?  For example, one could compare how technologists, historians and 
religious scholars categorize documents about printing. Library 
classification systems allow some of this kind of exploration, but it is 
very constrained by physical catalogs and shelves.  I have come to believe 
rather strongly that easy access to multiple points of view is a powerful 
engine of creativity, helping us to see patterns and analyze at higher 
levels.  For example, it was only fairly recently that biologists and 
physicists realized that they were using substantially similar mathematical 
models to describe certain aspects of their fields.  When they sat down to 
figure out why, our understanding of complex systems took a leap 
forward.  This kind of discovery should accelerate as we realize that order 
can emerge from the seeming chaos of a web of information.

Librarians probably understand better than almost anyone that it will be a 
long time, if ever, before computers can really figure out how our minds 
categorize things -- that ability is clearly a key to language and 
knowledge.  In the meantime, by recording the correlations we invent in 
categorization, digital information systems can be serendipity engines 
without having to analyze causality (with little understanding of the 
information they store, that is).  Searching content is causal, based on 
the logic of a priori knowledge such as "documents about the history of 
printing are likely to mention Gutenberg."  Navigating content is only 
based on correlation: you can discover the fact that such documents mention 
Gutenberg, even if you -- and more importantly, the computer -- didn't know it.

Nick