[Web4lib] MARC to XML: the agony and the ecstasy

Wed Oct 10 10:08:44 EDT 2007

---- Original message ----
>Date: Wed, 10 Oct 2007 09:24:38 -0400
>From: "K.G. Schneider" <kgs at bluehighways.com>  
>Subject: [Web4lib] MARC to XML: the agony and the ecstasy  
>To: web4lib at webjunction.org
>
>For a presentation, I'm trying to come up with examples of what happens
>when MARC is transformed to XML and then exposed through search engines
>such as Endeca, Siderean, etc. ... examples that work well for
>librarians who are at least a little familiar with MARC. Part of the
>message I'm getting across is that MARC is very record-oriented and XML
>is about data (MARC is from Mars... XML is from Venus?); but I'm also
>trying to suggest that when we start exposing MARC in new webby
>environments sometimes we get some new abilities, but other times the
>results are not so pretty, due to limitations of the source data, even
>though it's now XML. 
>

Hmmmm, Interesting topic.  I don't have any examples off the top of my head, although I have a suggestion.  XML and MARC data are at opposite ends of at least one metric: how many software tools there are there that can manipulate, edit, index, and store the formats.  There's a few very specialized MARC tools and a lot of general XML tools.  Maybe a side by side visual comparison of some sort, the left having logos various software that can work with MARC and number of users and the right having software that can work with XML and it' user base.  Might be a lot of work for just one graphic.

Now, I'd imagine that someone would ask how much of this software can work with MARCXML or similar formats, but most XML software have methods for specifying the semantics of the language.  DTDs, RelaxNg, W3C Schema, and so on will let you use an editor for all XML documents but change certain behavior depending on the type of XML document.  In fact, when it comes to XML there's actually the danger of being paralyzed by the number of choices.  I find most people with experience seem to be pretty good at navigating through the choices.

One other little nitpick.   XML isn't necessarily all about the data.  There's plenty of little oddities of XML that only makes sense when realizing there was a strong focus on marking up documents in SGML.  It explains decisions like the verboseness.  A few tags here and there in a large document isn't likely to have a large impact.  Using XML for something like a spreadsheet would quickly lead to most of the document being markup and not data.

It's a good question though.  I'll be looking forward to the other responses.

Jon Gorman