[Web4lib] MARC strictness

Mon Nov 28 10:57:35 EST 2005

Hi Lars,

> I'm looking at a set of MARC records from a library near me.  
> Their cataloging guidelines are a very close translation of 
> the Library of Congress' MARC21 guidelines, but there seems 
> to be a lot of built-in tradition too, that isn't covered in 
> documents.

Although it isn't necessarily obvious from looking at a set of MARC
guidelines (whether it's MARC 21 Concise from LOC or OCLC MARC or
whatever), MARC is only supposed to dictate the structure of the
record's content, not the formatting of that content. Whether or not the
100 field is written thusly:

100 1 $a James Lathrop Meriam $d 1917 to 2000 ["incorrect" according to
most library's cataloging standards]

is outside the scope of any MARC specification. In order to get the
content formatting, you have to use a set of cataloging rules.
English-speaking countries use the Anglo-American Cataloging Rules,
version 2 (AACR2), which is a very complex set of rules that tells you
exactly how to format text in a cataloging record (no matter whether or
not that record is encoded using MARC).

Now, I wasn't even a twinkle in my father's eye when MARC and AACR came
about, but, from what I understand, these cataloging rules come out of
the card catalog era, when you had to be very concise in order to fit
the pertinent metadata on a card. This conciseness translated well to
MARC when it was young due to strict limitations on numbers of
characters for subfields, fields, and records. That's why you have all
the abbreviations, punctuation conventions, etc.

You're absolutely right, however, that there is a lot of tradition built
into most libraries' cataloging practices. Although we all use
MARC/AACR2, we all have our own local practices that--for better or
worse--sometimes contradict these standards. And, because of these
standards' age and the richness of their histories, there can be a lot
of variety in local practice between, and even within, libraries. In a
lot of cases there was at one time a good reason for a particular quirk,
but the reasons have been forgotten or simply no longer apply--and yet
the quirk persists. For some reason, I find the topic of local
cataloging practices, how they developed, and why they exist to be
terribly fascinating, so I apologize if I'm rambling.

> My experience (and I should point out that I'm a programmer, not a
> librarian) tells me that people will follow formatting rules 
> if it matters, but not otherwise.  All C, Java, and Perl 
> programs have statements that end in a semicolon, or else 
> they refuse to run.  
> But not all programs are well structured, or easy to explain.  
> And this seems to apply to MARC records as well.

Hmm. I also have a programming background that predates my librarian
background, and that's a very interesting insight--although MARC/AACR2
provides a lot more structure than do the formatting rules of any
particular programming language.

> The search interface to this library's catalog seems to 
> handle every subfield just the same.  Sometimes in the 
> personal names fields (100, 600, 700), I see subfields $c 
> (title) and $d (years of birth and death) interchanged:
> 
>    100 1  $a Meriam, James Lathrop, $c 1917-2000.
>    100 0  $a Husayn ibn Ali, $d King of Hejaz, $c 1853?-1931.
>    700 1  $a Barth $d Professor $4 aut

Something like this has got to be a mistake. The structure of the
personal name fields is so standardized that switching subfields around
like this could not be an actual practice. It's just too big of an
inconsistency. And this is a MARC inconsistency, not an AACR2
inconsistency.

> In the two first examples, if the subfield markers are 
> removed, the remainder is a human-readable line of text with 
> commas and a period at the end.  This is the more common 
> case, but the third example doesn't have these commas. Is 
> there a rule for this?

Yes. :-) In a card catalog record, "Meriam, James Lathrop, 1917-2000."
would be written exactly so. This entire string would represent the
author. In MARC, although the dates are separated out into a separate
subfield, the formatting conventions persist. The last example, AFAIK,
isn't meant to be read as a single string, so each separate subfield is
just a separate piece of data, hence the lack of puncuation.

Although I am not an expert on MARC/AACR2, similarities to those first
two examples that you gave exist in the title fields, the publication
information fields, and the physical description fields, among others.

So, a lot of these formatting conventions come out of cataloging
tradition. Where there are no traditions to guide them, there are no
strange looking formatting conventions.

If you're interested, I would find a copy of AACR2, or at least a
concise version of it. The book that helped me tremendously is "The
Concise AACR2" by Michael Gorman (yeah, yeah, I know...). It's currently
in its 4th edition.

> In trying to clean up the records, simply removing the comma 
> or period at the end of a subfield is insufficient, because 
> there are cases such as "$c Dr." or "$a Eliot, T. S." where 
> the period should be part of the subfield.
> 
> The contents of subfield $d also varies greatly, e.g. the 
> English "fl." (flourished) is mixed with the Swedish "levde", 
> or the English "B.C." with the Swedish "f.Kr.", or more 
> complicated statements such as "was born no later than 1751". 
>  Circa can be abbreviated "c." (as in English) or "ca" or 
> "c:a" (as in Swedish). 
> Or the simple question mark after 1853 in the example above. 
> In LoC's guidelines, I find no rules for the text inside the 
> $d subfield.

:-) Yes. Again, that's AACR2's job to define the text inside a subfield,
not MARC's (with some exceptions). The examples given in LOC's
guidelines are formatted according to AACR2, I believe, just because
that's what everyone uses.

Attempting to automatically process the content of human-created MARC
records is going to give you the headache to end all headaches, because
cataloging rules, even within a single standard, are not consistent--at
least, not by a computer's definition of "consistent."

> Apparently, all these formatting inconsistencies exist 
> because it really doesn't matter.  You can search for 
> "Lathrop 1917" or "King Husayn ibn Ali" and you find what 
> you're looking for.  Nobody would search for people having 
> the title 1917.

Right. I'm not entirely sure how most library systems index MARC
records, but I imagine that they would have to ignore
formatting--otherwise searching would be impossible.

The next logical question, then, is: why is it so important to
catalogers that every comma, period, capatalized letter, etc. is in the
right place? Well, beyond for the sake of following the "standard"
(whether that's AACR2 or some local practice), I really don't know.

I think this is part of the reason that catalogers look so suspiciously
on "metadata," and why those of us who come from a more IT-ish
background can get so frustrated when dealing with metadata in a library
setting. Metadata really does not need to be (and really *should not*
be) as complicated as some catalogers--at least, in my experience--would
like to make it. Of course, I don't think this is their fault. I think
it's just an effect of dealing with a metadata standard as complex and
arcane as MARC for an extended period of time.

> Is this kind of inconsistency a problem, and how do libraries 
> handle it?  Do you insist that such errors be corrected (and 
> how do you motivate this requirement?), or have you long 
> since given up that fight?

I'm not a part of this at our own library, so I can't give a very
detailed answer. But I know that our catalogers attempt to make all the
records they create--whether it's original cataloging or modifying
records downloaded from OCLC or a vendor--conform to local practices and
"correct" cataloging rules, whenever possible. If there's a major
problem with a record that's already in our catalog, the problem is
brought to the attention of our database management team and they try to
fix whatever is causing the problem in the record. Doubtless there are
many, many records with problems that are still waiting to be found.

How do library *systems* deal with this level of inconsistency? I
imagine it varies from system to system.

Does that help?

Jason Thomale
Metadata Librarian
Texas Tech University Libraries