SGML for Web Pages

Thomas W. Eland tweland at mm.com
Mon Dec 18 19:36:43 EST 1995


Robb Scholten writes:

     "I will wager that most of what is published today electronically will be 
     'lost' a hundred years from now.  This very fact should strike some awe and 
     trepidation in the hearts of librarians who are committed to the task of 
     archiving information."

     "The digital revolution requires some superhuman intervention on someone's 
     part to capture information for posterity.  Open standards that are 
     easily upgraded will make our lives a hell of a lot easier in the decades 
     to come.  Is HTML the best standard for display of text and graphics?  I 
     don't know, but I wouldn't rush to transform my entire library into web 
     pages."

I think he is absolutely right.  HTML is not the standard that we want to use to 
mark-up important textual information.  It can be used for many things, like 
indexes, home pages, etc.  In fact the index I created for Literacy and Adult 
Education materials was created in two versions, on in HTML for today's web 
browsers, for the other we made our own DTD using an SGML author/editor which is 
much more sophisticated than HTML and takes advantage of SoftQuad's new Panaroma 
SGML browser. (For any of those interested, you can find both versions of the 
Internet Directory of Literacy and Adult Education Resources at 
http://novel.nifl.gov/litdir/index.html.  You can link to SoftQuad and download 
a shareware copy of Panorama and see the difference).

Anyway, I think it is up to the library community to come up with or support a 
standard, like we did with MARC, that we will use to create and archive 
electronic information.  I would encourage every librarian interested in this 
issue to get their hands on the "Guidelines for Electronic Text Encoding and 
Interchange" (TEI) which was produced by the Association for Computers and 
Humanities, the Association of Computational Linguistics, and the Association 
for Literary and Linguistics Computing.  TEI is a DTD of SGML just like HTML.  
However, the guidelines come in two volumes (about twice as long as AACR2) and 
provide for complex mark-up.  The real beauty of TEI for librarians is that it 
allows the user to embed all the descriptive and subject elements that one would 
use to create a MARC record in the text itself (actually in the text header).  
There is no need to create a separate MARC record that links to an electronic 
text.  This provides for efficiency as well as portability.  And since TEI is 
created from SGML it conforms to all SGML standards.  Personally, I think TEI 
should be taught in library school cataloging courses.  TEI may not be the 
complete answer, but it goes a long way.  It even contains a section discussing 
header elements and their relationship to the MARC record for those who wish to 
load TEI independent headers into MARC-based retrieval systems.

Unfortunately, there is no address given in the books for where to write for 
info. on TEI, and of course I don't have the order form because I sent it in to 
get my copy.  I'll look around on the net to see if I can't get an address.  Or 
if someone else knows it, maybe you could post it.

Tom Eland, Librarian
Minneapolis Community and Technical College
1501 Hennepin Avenue
Minneapolis, MN 55403
tweland at mm.com


More information about the Web4lib mailing list