[WEB4LIB] Copernic Summarizer // FW: Metadata Conversion and the Library OPAC

Tue Feb 27 07:20:02 EST 2001

At 5:48 PM -0800 26/2/01, ernest perez wrote:
>The following "abstract" (??) took about 5 seconds in Summarizer.
>This resulted from  auto-analysis and summary of "Metadata Conversion
>and the Library OPAC," by Amanda Xu, at 
><http://web.mit.edu/waynej/www/xu.htm>.
>
>This sample of the Copernic product output resulted from Ms. Xu's 11-page
>HTML document. It's not an easy text to summarize.
>
>You may be interested in comparing the summary to the original.
>Do YOU think it's a decent summary?

This kind of capability has been built into the Apple operating 
system for about two years since MacOS 8.5 was released with Sherlock 
http://www.asia.apple.com/sherlock/  which indexes all files on a 
system so that you can search not just by file name but by content.

The "abstract" it produces is -

>Not all resources with metadata attached will be discovered by 
>search engines, because the types of metadata a search engine 
>gathers depends largely on the types of metadata templates that are 
>profiled. [3] For those "Internet accessible but non-HTML based 
>resources," [4] metadata can be accessed via protocols such as Whois 
>++, [5] LDAP (Lightweight Directory Access Protocol), [6] Z39.50, 
>[7] or other prioprietary search engines.... The most frequently 
>mapped metadata formats are: IAFA (Internet Anonymous FTP Archive) 
>templates, [12] Dublin Core metadata sets, [13] USMARC, GILS 
>(Government Information Locator Services), [14] SGML TEI Header, 
>EAD, [15] and Z39.50 tag set G. [16] Among them, USMARC, TEI 
>Headers, EAD, GILS, and Dublin Core can represent the center of 
>metadata mapping.
>
>By mapping the content, syntax, and data elements of various 
>metadata models, correct metadata conversion between various 
>syntaxes can be assured. [17] Sketchy records such as IAFA and 
>Dublin Core records can be accurately upgraded during the migration 
>so as to satisfy the needs of rich description records such as 
>USMARC, TEI Header, and EAD....
>
>Specifically speaking, how can those "aggregated metadata objects" 
>[18] such as USMARC bibliographic records, SGML metadata records, 
>Dublin Core metadata records, GILS records, finding aids in EAD, and 
>other future metadata records be organized in a consistent way so 
>that they can be interchanged in a distributed networking 
>environment?...
>
>Many libraries are using metadata already. [26, 27, 28, 29] They 
>have created EAD records for archival description and finding aids, 
>SGML TEI Headers for electronic text, Dublin Core for simplified 
>description of networked resources, and GILS for Government 
>Information Locator Services....
>
>Therefore, if library OPACs are used as gateways to access all the 
>databases, including metadata repositories either on library Web 
>sites or on local databases, a metadata conversion system built into 
>the gateway is needed to ensure metadata interchange....
>
>In situation A, metadata can be extracted by automatically matching 
>semantically similar elements and structures found in standard 
>metadata format templates, namely the templates for Dublin Core 
>Metadata Sets, EAD, TEI Header, GILS, and the USMARC format....
>
>* For locally created metadata on the repository (that is, 
>specialized metadata mounted on local databases), the metadata 
>conversion system will identify the content-bearing metadata 
>elements, load them into a specified USMARC template, and convert 
>and index them into existing databases;  * When a library OPAC is 
>used as a gateway to access remote metadata repositories, a metadata 
>conversion system will verify if the resources contain 
>meta-information, load the data elements into metadata templates, 
>confirm the metadata format and encoding level, and then display 
>metadata in user-specified formats....
>
>Another reason for using USMARC as a common template is that the 
>Electronic Location and Access (856) field has been added to USMARC, 
>making it possible to connect USMARC records to their source data 
>directly via sophisticated OPACs.

Serac software have extended it's capability to the web with 
iRemember http://www.seracsoftware.com/iremember.html

IRemember watches the web traffic to your machine and indexes every 
page you view. I now have a database of 9500 pages which I find 
extremely useful - often more so that the big search engines.

Tony

-- 
phone  +61 2 6241 7659
mailto:me at Tony-Barry.emu.id.au
http://purl.oclc.org/NET/Tony.Barry