[Web4lib] MARC strictness

Lars Aronsson lars at aronsson.se
Mon Nov 28 05:50:38 EST 2005


I'm looking at a set of MARC records from a library near me.  
Their cataloging guidelines are a very close translation of the 
Library of Congress' MARC21 guidelines, but there seems to be a 
lot of built-in tradition too, that isn't covered in documents.

My experience (and I should point out that I'm a programmer, not a 
librarian) tells me that people will follow formatting rules if it 
matters, but not otherwise.  All C, Java, and Perl programs have 
statements that end in a semicolon, or else they refuse to run.  
But not all programs are well structured, or easy to explain.  
And this seems to apply to MARC records as well.

The search interface to this library's catalog seems to handle 
every subfield just the same.  Sometimes in the personal names 
fields (100, 600, 700), I see subfields $c (title) and $d (years 
of birth and death) interchanged:

   100 1  $a Meriam, James Lathrop, $c 1917-2000.
   100 0  $a Husayn ibn Ali, $d King of Hejaz, $c 1853?-1931.
   700 1  $a Barth $d Professor $4 aut

In the two first examples, if the subfield markers are removed, 
the remainder is a human-readable line of text with commas and a 
period at the end.  This is the more common case, but the third 
example doesn't have these commas. Is there a rule for this?
In trying to clean up the records, simply removing the comma or 
period at the end of a subfield is insufficient, because there are 
cases such as "$c Dr." or "$a Eliot, T. S." where the period 
should be part of the subfield.

The contents of subfield $d also varies greatly, e.g. the English 
"fl." (flourished) is mixed with the Swedish "levde", or the 
English "B.C." with the Swedish "f.Kr.", or more complicated 
statements such as "was born no later than 1751".  Circa can be 
abbreviated "c." (as in English) or "ca" or "c:a" (as in Swedish). 
Or the simple question mark after 1853 in the example above. In 
LoC's guidelines, I find no rules for the text inside the $d 
subfield.

Apparently, all these formatting inconsistencies exist because it 
really doesn't matter.  You can search for "Lathrop 1917" or "King 
Husayn ibn Ali" and you find what you're looking for.  Nobody 
would search for people having the title 1917.

Is this kind of inconsistency a problem, and how do libraries 
handle it?  Do you insist that such errors be corrected (and how 
do you motivate this requirement?), or have you long since given 
up that fight?



-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se


More information about the Web4lib mailing list