[Web4lib] MARC strictness

Tue Nov 29 11:20:04 EST 2005

Just quick notes...

Relating to another posting: ISBD punctuation has been the bane of people
handling MARC records for a very long time; you have to design display
rules to stay out of its way, which sometimes interferes with making data
look right. I remember the justification for ISBD punctuation being "so you
could determine the bibliographic elements when the cataloging is in a
language you don't read"--you know, because there's always a
space-semicolon-space after this element, etc. That always struck me as a
vaguely improbable scenario: "I don't know what language that book is in,
but there's the statement of responsibility!" (But hey, I'm definitely not
a cataloger.)

As for "us all" using templates and/or having multiple people to design
MARC displays: Generalizations are usually tricky. The display
specifications for Eureka databases (including the RLG Union Catalog) have
always been based directly on MARC fields, subfields, and indicators; we
certainly don't have the luxury of "normalizing" those records in any
organized manner (since the database is being updated daily); and the
personpower available to write that spec has never been more than part of
one person (me), with review and occasional assistance from a variety of
others. Admittedly, that spec was based on considerably more than a decade
of working with and explaining MARC. (Implementation of that spec has
typically involved part of one other person..)

The spec for the Eureka Full display is 18 single-spaced pages...and has
evolved over the past decade, albeit slowly. [Actually, 18 pages isn't
quite right: That doesn't include MARC holdings, which at one point had a
separate five-page spec.]

I keep hearing about a "more semantically rich" format than MARC21--at the
same time I hear that all we need is "a single line of undifferentiated
keywords and identifiers." The odds of widespread adoption of a more
semantically rich format strike me as similar to the odds of universal
success of the Semantic Web (that is, slender); what would be lost in
abandoning the structure defined by MARC might not matter for some users
but would certainly hamper in-depth research by specialists...

As for strictness, though: MARC itself requires an 001 field (record ID),
an 008 (various fixed-length control elements, most of which can be
dummied) and a 245 (title, which may be a cataloger-supplied title). All
else is optional.

Walt Crawford
wcc at rlg.org, 650-691-2227
-------------------------------------
Typically reachable:
Monday & Wednesday 7 a.m.-3 p.m.
Tuesday & Thursday 7 a.m.-2 p.m.
Friday 7-11 a.m.
--------------------------------------

web4lib-bounces at webjunction.org wrote on 11/28/2005 03:51:19 PM:

> Hi,
>
> On 11/29/05, Mike Taylor <mike at miketaylor.org.uk> wrote:
> > This tells me that all MARC records could be replaced a single line of
> > undifferentiated keywords and identifiers, like this: "c s lewis the
> > abolition of man moral law subjectivism 0060652942".
> >
> > No!  Don't shoot me!  I'm only joking!  I think!
>
>   1. You're both right and wrong.
>   2. We all love and hate MARC at the same time.
>     and
>   3. Welcome to metadata heaven *and* hell.
>
> Very concrete, isn't it? :)
>
> > What it really _does_ show -- I think -- is that _for the purposes of
> > Amazon-like searching_, this ultra-weak metadata suffices.  The
> > question is what proportion of all catalogue searching is in this
> > sense "Amazon-like", and my feeling is that the answer is very close
> > to 100% of it.  Not quite 100%, though: sometimes you really do need
> > to differentiate between searching for books _written by_ Winston
> > Churchill and books _about_ Winston Churchill.
>
> I think you're taking MARC too literal. You have to remember that is
> is a 30 year old culture more than a strict standard, and I and my
> collagues certainly treat it that way. No one handles MARC out of the
> box; there are normalisation filters and procedures it has to go
> through, lots of general second-guessing meaning and some black magic
> thrown in to work out if the identification of anything within the
> record is usable.
>
> > Finally let me also say that of course metadata has other uses as well
> > as searching.  Roughly, the other half of the equation is retrieval,
> > or display.  But again, I find myself thinking that the world probably
> > need rather less in the way of structure here than we information
> > professionals tend to want to give them.
>
> MARC is simply wonderful ... *when* you know how to handle it! If you
> just use it out of the box, you *will* get into trouble. You need to
> define a good measures of normalisation and cleaning up. There's a few
> projects around that does that. Just to give you a good idea, we've
> got three dedicated developers full time for the last year that have
> create such a normalisation process, and we're still not happy with
> it. It's a know problem within the library world, which is the very
> reason a lot of us wants to push towards a more semantically rich
> format. But of course, it ain't happening any time soon.
>
> Good luck.
>
>
> Alex
> --
> "Ultimately, all things are known because you want to believe you know."
>                                                          - Frank Herbert
> __ http://shelter.nu/ __________________________________________________
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/