Classifying Web Sites

Wed Jun 5 16:55:30 EDT 1996

> 
> Anyway I just need some feedback on what's been running around in my head for the past couple of 
> weeks and this seems like a good place to do it. What do you think?
> 

Great idea to have the creators of pages classify it themselves but 
impossible to ask them to do it with the LC classification with the 
current tools available.

The LC classification is is a hierarchical enumerative monster.

No deviation from the norm is possible, so only full-time cataloguers 
(okay maybe ex-fultimers doing it 20 hours a week) can possibly use 
it right.

Unless of course you developped some kind of simple "classifying by 
example" program that would connect to a huge database  of all 
possible examples for LC classification codes.  And even that would 
not be enough to make the LC classification easy to apply by the 
profane.  Something more would be needed.  Can't get my head focused 
on it right now.

Of course, the ideal would be to develop a true hierarchical 
classification system for web pages. Yahoo has the beginnings of such 
a system by its hierarchy, but it does not have the other essential 
components of a system (such as a notation sub-system) and of course, 
the basic elements of its hierarchy are severely flawed, since they 
started it without the benefit of any serious analysis.  Only later 
did they hire persons (from the Artificial Intelligence community by 
the way, not from the library or indexing professions) with some 
notions of the needs of subject hierarchies.

Do not wince at the idea of starting over when Dewey and LC already 
exist.  In the Records management field we HAVE to start over each 
time we get to a new corporation or institution because the way they 
view information is different from one huge company to another. Also, 
we have to give them the simplest systems to use because nearly 
everybody will have to apply it, thus eliminating the Dewey and LC 
hierarchies.

The History of Records Management (and to a lesser extent Archives) 
in North America is riddled with stories of disastrous attempts at 
trying to apply the LC classification or the Dewey classification and 
its cousin the U.D.C. in the wrong circumstances.  That is, in 
circumstances where nearly all users are casual users in opposition 
to circumstances where there is an office or a group of persons 
working full time at putting the right codes on the documents.

Seem familiar to the scenario of web page creators coding their pages 
themselves?  It is!

Systems which ape the LC or Dewey classifications in their complexity 
have also led to disasters uncountable.  The Dewey and LC systems 
(and the UDC, and even the colon system) are fine objects of study to 
get an idea of what it takes to build a classification system, but 
you really have to do some readings on the matter to get an idea of 
what you have to simplify to make classification by the casual user ( 
a creator of web pages) a reality.

One of the best books on the subject ,by the way, was written by an 
Australian.  If anybody knows a better introduction, a better guide 
to all of this than "The subject approach to information"  by A.C. 
Foskett, do tell.

Of course, while Foskett strives to consider Information in General, 
most of his examples deal with books and other published materials.  
Web pages have attributes belonging more or less to the field of 
unpublished material. Having a publisher, a printer, a distributor 
and who knows what else to impose standards makes something much 
easier to control.  Web pages on the other hand can pop up in any 
size or form, and should thus be considered as Records.

So, they need a new classification system.

Again, do not groan, just think of the birth of the Dewey and LC 
classification systems in the 19th century.  They did not spring 
forth complete and alone in a huge field.  They emerged slowly with 
many other competing systems (a few of which still survive here and 
there) to answer the needs of a new medium:  The cheap book.  Before 
the middle of the 19th century books were very dear.  After the 
introduction of wood based paper and steam printing presses, books 
began to be produced in horrendous numbers.  Libraries were suddenly 
packed to the gills and new systems, to shelve and find the books were 
needed.

The fact that the Dewey and LC systems survived till now 
(while the others perished) probably means that they were the most 
adequate (given also the backing they had from two important 
institutions) for the classification of printed books by experts.  It 
does not mean that they are the correct choice for the coding of even 
more horrendous numbers of web pages by casual users.

Au revoir!

Alain Vaillancourt