Classifying Web Sites
Alain Vaillancourt
NDGMTLCD at GSLIS.Lan.McGill.CA
Wed Jun 5 16:55:30 EDT 1996
>
> Anyway I just need some feedback on what's been running around in my head for the past couple of
> weeks and this seems like a good place to do it. What do you think?
>
Great idea to have the creators of pages classify it themselves but
impossible to ask them to do it with the LC classification with the
current tools available.
The LC classification is is a hierarchical enumerative monster.
No deviation from the norm is possible, so only full-time cataloguers
(okay maybe ex-fultimers doing it 20 hours a week) can possibly use
it right.
Unless of course you developped some kind of simple "classifying by
example" program that would connect to a huge database of all
possible examples for LC classification codes. And even that would
not be enough to make the LC classification easy to apply by the
profane. Something more would be needed. Can't get my head focused
on it right now.
Of course, the ideal would be to develop a true hierarchical
classification system for web pages. Yahoo has the beginnings of such
a system by its hierarchy, but it does not have the other essential
components of a system (such as a notation sub-system) and of course,
the basic elements of its hierarchy are severely flawed, since they
started it without the benefit of any serious analysis. Only later
did they hire persons (from the Artificial Intelligence community by
the way, not from the library or indexing professions) with some
notions of the needs of subject hierarchies.
Do not wince at the idea of starting over when Dewey and LC already
exist. In the Records management field we HAVE to start over each
time we get to a new corporation or institution because the way they
view information is different from one huge company to another. Also,
we have to give them the simplest systems to use because nearly
everybody will have to apply it, thus eliminating the Dewey and LC
hierarchies.
The History of Records Management (and to a lesser extent Archives)
in North America is riddled with stories of disastrous attempts at
trying to apply the LC classification or the Dewey classification and
its cousin the U.D.C. in the wrong circumstances. That is, in
circumstances where nearly all users are casual users in opposition
to circumstances where there is an office or a group of persons
working full time at putting the right codes on the documents.
Seem familiar to the scenario of web page creators coding their pages
themselves? It is!
Systems which ape the LC or Dewey classifications in their complexity
have also led to disasters uncountable. The Dewey and LC systems
(and the UDC, and even the colon system) are fine objects of study to
get an idea of what it takes to build a classification system, but
you really have to do some readings on the matter to get an idea of
what you have to simplify to make classification by the casual user (
a creator of web pages) a reality.
One of the best books on the subject ,by the way, was written by an
Australian. If anybody knows a better introduction, a better guide
to all of this than "The subject approach to information" by A.C.
Foskett, do tell.
Of course, while Foskett strives to consider Information in General,
most of his examples deal with books and other published materials.
Web pages have attributes belonging more or less to the field of
unpublished material. Having a publisher, a printer, a distributor
and who knows what else to impose standards makes something much
easier to control. Web pages on the other hand can pop up in any
size or form, and should thus be considered as Records.
So, they need a new classification system.
Again, do not groan, just think of the birth of the Dewey and LC
classification systems in the 19th century. They did not spring
forth complete and alone in a huge field. They emerged slowly with
many other competing systems (a few of which still survive here and
there) to answer the needs of a new medium: The cheap book. Before
the middle of the 19th century books were very dear. After the
introduction of wood based paper and steam printing presses, books
began to be produced in horrendous numbers. Libraries were suddenly
packed to the gills and new systems, to shelve and find the books were
needed.
The fact that the Dewey and LC systems survived till now
(while the others perished) probably means that they were the most
adequate (given also the backing they had from two important
institutions) for the classification of printed books by experts. It
does not mean that they are the correct choice for the coding of even
more horrendous numbers of web pages by casual users.
Au revoir!
Alain Vaillancourt
More information about the Web4lib
mailing list