SHOE-MARC

Fri Aug 23 14:56:14 EDT 1996

-------------------------------------------------------------------------------
Gerry McKiernan asked me to respond in this email list to a question he posed
to me privately about MARC, web pages, and SHOE (an ontological mechanism
for HTML).  I am not a member of this list, so please cc: me in any
replies to the list.

On Wed, 21 Aug 1996, Gerry McKiernan wrote:

>    It has occurred to me that perhaps SHOE would be a mechanism by which
> existing Web pages could easily be 'annotated' with appropropriate
> HTML-MARC coding.
>
>    I'd appreciate any reaction to the possibilities of a SHOE MARC
> and related musings.

So that anyone who reads this might know what the heck I'm even talking about
:-) perhaps it's best to peruse the SHOE home page at:

        http://www.cs.umd.edu/projects/plus/SHOE/

Sure, MARC records can be done quite nicely in SHOE.  Attributes would
fit nicely, etc.  SHOE is semantically easily superset of anything
handled in MARC and LC.  If you're interested in embedding MARC or the LC
subject headings etc. in a solid framework on the web, I think SHOE would
be an excellent choice.

The big question is whether MARC and LC (as they stand) on the web really
is a good idea.  I argue that it really is not.

I don't know MARC very well, but to my mind there are two problems with
MARC as applied to the web in particular.

        - MARC is an aging scheme (computer-wise) born in the age of big-iron
          machines, that is beginning to creak a bit as a format.  MARC as a
          raw format simply doesn't fit well within the HTTP/HTML/MIME
          triumverate.

        - As I understand it, MARC was originally designed for books and
          journals, though it can handle (with more or less success) other
          kinds of archived objects.  However, MARC's fields (and lack of
          specific fields) and overall design seem to me to be geared to
          archiving elements with certain assumptions, namely that those
          elements are:

                tangible and physical
                not rapidly changing (at least not faster than MARC can keep up)
                consistently available (at least to the archiver)
                existent, except after some mishap
                "truthful"--that is, the MARC record is *assumed*
                        to be telling true facts about the elements in question

          Unfortunately, web documents, and especially HTML documents, exhibit
          *none* of these qualities.  I am not convinced that MARC can be
          extended to effectively deal with the issues found on the web;
          archivists on the web must face a singular difference from other
          archivists:  the archival material doesn't "physically"  exist--it
          is in reality raw information tied to no particular tangible entity
          (an actual physical book, for example).  It may be the case that
          some other mechanism might be better suited to dealing with these
          issues (if they really need to be dealt with at all).

          ...and, of course, MARC provides an enormous number of fields that
          are unapplicable to web pages.  Do web pages need ISBN numbers?
          Suggested prices?  etc.  Though this is probably a bit of a
          non-issue...

LC subject headings and the LC classification scheme, to my mind, suffer
as well.  While LC has been at least reasonably successful in handling
subject classification for physical, slowly-changing (as in, not every
minute :-) entities such as found in libraries, I am not convinced that
it is the best option for classifying the web.  I personally believe that
LC is far too large and unyieldly to use in any reasonable classification
on the web.  SHOE attempts to provide a framework designed to fix some of
the resultant problems:

        - Subject headings assume that documents are about *topics*.  But in
          fact the very large majority of documents on the web are not about
          a particular *topic* but perport to *be* that topic.  Consider
          an official home page for a rock band, say, Smashing Pumpkins.
          This page is not about rock bands, or "about" Smashing Pumpkins.
          Instead, the page purports to in some sense "be" the Smashing
          Pumpkins--"Here's our web page!  You can get ahold of us here!"

          A more appropriate classification scheme, as I see it, would be
          something that web pages to declare data-entities that represent
          something real, not topical.  For example, in SHOE a web page
          may create a data-entity which declares that it (the data-entity,
          not the web page) "is" a graduate student and a research assistant,
          that its "name" is "Sean Luke", etc.

        - LC is a single hierarchy, maintained by a single body.  As a result
          LC is hopelessly unable to keep up with the fast-changing topics
          in real life.  This is a common reason why many fast-changing
          disciplines (Computer Science, for example) have abandoned LC
          entirely just to get a handle on their own work.

          SHOE addresses this by allowing ontologies to extend one another;
          hence someone like the Library of Congress might dictate a common
          abstract, high-level ontology defining overall groups and topics,
          and someone like the ACM could extend those to fit its current
          and fast-changing needs in a narrow, specialized area.

        - LC is solely a categorization scheme, with no ability to define
          relationship attributes to those categories.  Hence when LC needs
          to describe Apple Computer, they might have to create a special
          category called "Computer Company", rather than the (better and
          simpler) method of just calling it a "Company", and creating a
          "deals-in" relationship to "Computers"

          SHOE provides sophisticated relationships, both real and inferred,
          between data-entity instances, categories, raw data (numbers,
          strings, dates, booleans, etc.), etc. While there are a few
          weaknesses that I hope to address in the next incarnation of the
          spec, I nonetheless think this is a much stronger framework on
          which one might build a topical hierarchy, or even (possibly
          better) a relatively flat set of categories coupled with a rich
          set of attributes that enable entities to adequately describe
          themselves without set of subject headings as large as the number
          of entities themselves.  :-)

Overall, I'd say that MARC and LC fit nicely in the SHOE ontological
scheme, so if you're interested in pursuing MARC records etc. on the web
rather than going for an alternative mechanism, SHOE would be great.  On
the other hand, I really think that MARC as a standard may not be able to
make the transition to electronic documents because it's getting up there
in age computer-wise, and that LC, all 9 volumes of it now, has grown too
large and monolithic to succeed either.  We need some new, smaller, and
more flexible standards than the stuff put out by the federal
government.  :-)

_____________________________________________________________________________
Sean Luke                   "I've discovered that P==NP, but the proof is too
U Maryland at College Park   large to fit in the margins of this signature."
seanl at cs.umd.edu             URL: http://www.cs.umd.edu/~seanl/

p.s., I should mention that there are other, VERY sophisticated ontological
schemes for computer data out there, mostly out the Knowledge
Representation (AI) community.  The big one right now is probably KIF,
with KQML as a transfer protocol (the KR equivalent of HTML, using HTTP
as its protocol).  There are a number of arguments both for and against
KIF (as opposed to simpler, more streamlined but less expressive
mechanisms like SHOE), but that's probably beyond the purvue of this
discussion.

------- End of Forwarded Message