[WEB4LIB] Call Number/Letter Regular Expressions?

Kyle Banerjee kyle.banerjee at state.or.us
Thu Jul 25 19:51:49 EDT 2002


> I'm wondering if anyone out there has written any regular expressions to
> identify call numbers from a string of letters/numbers. I am looking for a
> set of regular expressions that can identify LC, DDC, ISSN, and the more
> common sets of numbers used in catalogging.

    Shouldn't be too hard. It is possible to generate more accurate regular
expressions than the ones below, but these are quick 'n easy. First of all,
strip hyphens out of the input and normalize everything to upper case to
simplify matching. Note the carets at the beginning of the line. They are
important. To detect:

LC call number: "^[A-Z]{1,3}[1-9]" (1-3 letters followed by a number between
1 and 9 followed by anything)
SUDOC: ":" (a colon anywhere in the line -- if you use this as a match
method, you must evaluate for sudoc after LC. Otherwise, many sudocs will
incorrectly be detected as LC)
DDC: "^[0-9]{3}\.?[0-9]*$" (3 digits followed by a dot and followed either
by a dot and more digits or an end of line)
ISSN: "^[0-9]{8}$" (8 digits)
ISBN: "^[0-9]{9}[0-9X]$" (9 digits followed by another digit or X)

    The expressions you use may depend on which call number schemes you're
trying to match. For example, if sudoc isn't the only scheme that contains a
colon, you'll need to use some more robust expressions. Unless you like
looking at screwy code, it's a good idea to simplify the input to meet your
needs. For example, strip Cutter values, local stamps, dates, etc. before
evaluating. Since the Cutter is separated from the call number by a space,
this is pretty easy.

    Note that different languages will give you slightly different
capabilities with regards to regular expressions. I'd recommend just surfing
the web for a quick tutorial on the subject and playing around.

kyle

***********************************************
Kyle Banerjee
Oregon State Library
250 Winter ST
Salem, OR 97310-0640
(503)378-4243 ext. 260
kyle.banerjee at state.or.us




More information about the Web4lib mailing list