[WEB4LIB] MARC -> XML

Joerg Messer joerg.messer at ubc.ca
Wed Mar 23 19:23:29 EST 2005


Hi Anita,

I've been doing a little hacking using Python, the Zoom PyZ3950 package and
the Amara XML tools. I don't know if my sample code is of any use but I've
included it below. I'm neophyte in this area so I make no claims of this being the prefered way of 
doing things.  If you come up with a better toolset, I'm all ears.

-----------------
from amara import binderytools

conn = zoom.Connection ('z3950.loc.gov', 7090)
conn.databaseName = 'Voyager'
conn.preferredRecordSyntax = 'USMARC'

query = zoom.Query ('CCL', 'ti=law')
results = conn.search (query)
print "Number of results: " + str(len(results))
count = 0;

for result in results:

   count = count + 1

   print "-------------------------------------------"
   print "Record: " + str(count)
   print "-------------------------------------------"

   raw = result.data

   # Convert to MARC
   marcdata = zmarc.MARC(raw)
   # print marcdata

   # Convert to MARCXML
   marcxml  = marcdata.toMARCXML()
   print marcxml

   # Remove non-ascii characters (these cause problems for Amara)
   marcxmlascii = unicode(marcxml, 'ascii', 'ignore').encode('ascii')
   # print marcxmlascii

   doc = binderytools.bind_string(marcxmlascii);

   #print "[" + doc.record.leader.xml_text_content() + "]"


   i = doc.xml_xpath("//datafield[@tag='020']/subfield[@code='a']")
   if len(i)>0:
       isbn = i[0].xml_text_content()
       print "  ISBN: " + isbn

   t = doc.xml_xpath("//datafield[@tag='245']/subfield[@code='a']")
   if len(t)>0:
       title = t[0].xml_text_content()
       print " Title: " + title

   a = doc.xml_xpath("//datafield[@tag='100']/subfield[@code='a']")
   if len(a)>0:
       author = a[0].xml_text_content()
       print "Author: " + author

   print "-------------------------------------------"

conn.close ()


Anita Chiodo wrote:
> Hi,
> I'm looking for information on best method (easiest/fastest/cleanest) to
> convert MARC to XML. Can anyone help guide me to any software packages
> or resources that are available?
> =20
> I've been through the web4lib archives and attempted to access
> information via LOC (received page errors); I've had little success with
> both.
> =20
> Sincerely,
> Anita
> Anita Chiodo, M.S.L.S.
> Manager, Library Services
> BrittleBOOK.com/BookARCHIVE.com
> Local Phone: 319-390-9442 x24
> Toll Free: 888-870-0484 x24
> Email: achiodo at newspaperarchive.com
> =20
> 
> 
> 
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************
> 

-- 
Joerg Messer
Programmer/Analyst
UBC Library Systems
2206 East Mall
Vancouver, B.C. Canada V6T 1Z3
T: +1.604.822.5091
F: +1.604.822.3201
W: www.library.ubc.ca
E: joerg.messer at ubc.ca



More information about the Web4lib mailing list