[Web4lib] Re: OpenSearch, Koha, and Library Catalogs [SRW/U]
Eric Lease Morgan
emorgan at nd.edu
Wed Jul 27 08:18:49 EDT 2005
On Jul 25, 2005, at 8:42 AM, Ross Singer wrote:
> OpenSearch is much lighter than that and would actually work better
> in a lot of arenas, such as providing an interface to courseware/
> campus portals and other places where you need to provide a simple
> interface to your collections....
I will try to describe in a bit more detail what I see as the
advantages of Search/Retrieve via URL (SRU).
A Web Service
SRU is a standardized search protocol in the shape of a Web Service.
In other words, one computer sends another computer a specifically
shaped URL. The second computer uses the URL as input, does some
processing, and returns a standardized stream of XML. It us up the
the first computer to then transform the resulting XML into something
else such as an HTML page, an email message, an RSS feed, or maybe
even an SQL query used to update a database. In this way, SRU is in
no different from OpenSearch. There are many examples of SRU
interfaces to indexes. I like to point to my Ockham Alerting services
as a case in point:
http://alert.ockham.org/
SRU "verbs"
Like OAI-PMH, another protocol in the form of a Web Service, the
URL's of SRU requests are rather simple. Where OAI has six "verbs",
SRU only needs three. They are explain, scan, and searchRetrieve.
Explain is used to learn about the underlying index. Name,
administrator, size, metadata supported, updating frequency, etc. It
is akin to the identify verb in OAI. The scan process is similar to
browsing the back-of-a-book index. Look at an item in the back-of-a-
book index, get an idea of how many documents are associated with the
term, and get an idea of what terms surround the term. searchRetieve
is the heart of the matter. It provides methods for querying the
underlying index. For librarians, the great thing about
searchRetrieve is the expressiveness of the query language -- CQL.
The query language allows for every type of Boolean operation,
nesting, field, proximity, and wild card searching. If I remember
correctly, OpenSearch only provides the means for simple term and/or
phrase searching.
SRU results
Where OpenSearch returns something looking a whole lot like RSS, the
results of SRU queries are a whole lot more flexible. Any combination
of namespaces and be included in the results allowing for any
combination of metadata elements, more than the RSS-like results of
OpenSearch. Because of this flexibility it is possible to encode more
than just title, URL, description, and keywords. If you wanted to
encode bibliographic and holdings data, then that is entirely
possible. If you wanted to return e-learning information such as
courseware, then that is possible too. Other options include just
about anything you can think of. GIS. Dates. Names and addresses.
Pointers to images and music. The specifications for building a
house. Etc.
SRU clients
Creating an SRU client is not as hard as it might seem to be. All you
really need is an HTML form and a browser that understands CSS and/or
XSTL. Here is a simple client:
<form action='sru-server.cgi' method='GET'>
<input name='operation' type='hidden' value='searchRetrieve' />
<input name='version' type='hidden' value='1.1' />
<input name='stylesheet' type='hidden' value='sru-results.css' />
<input name='query' type='text' value='dogs and cats' />
</form>
This might result in an XML stream looking something like this:
<? xml-stylesheet type='text/css' href='sru-results.css' ?>
<searchRetrieveResponse>
<version>1.1</version>
<numberOfRecords>1</numberOfRecords>
<records>
<record>
<recordSchema>info:srw/schema/1/dc-v1.1</recordSchema>
<recordPacking>xml</recordPacking>
<recordData>
<dc>
<title>The bottom dog</title>
<identifier>http://example.org/bottom.html</identifier>
</dc>
</recordData>
</record>
</records>
</searchRetrieveResponse>
The browser would then use the local CSS named to sru-results.css to
render the XML into something more readable for the user.
SRU and OpenSearch
Yes, at first glance it would seem that OpenSearch is easier to
implement, but such an implementation will be limiting. You do not
get an expressive query language with OpenSearch, nor do you get very
expressive search results. OpenSearch is an industry standard, sort
of. SRU is a community standard. It builds on the experience of the
Z39.50 community while losing Z39.50's complexity. Like OpenSearch,
SRU does not dictate the underlying search engine which could be
direct access to a relational database, an index created from swish-e
or plucene, or even a simple grep applied against a text file. There
are many institutions implementing SRU including the Library of
Congress, OCLC, IndexData, etc. There are many tools written in many
languages used to implement SRU including things in Perl, Python, C,
Java, etc.
Conclusion
I sincerely believe SRU implementations in libraries used to
facilitate search (and remember, librarians love to search and
everybody else like to find) will be much more scalable and
functional than implementations using OpenSearch. While OpenSearch
moves in the right direction, SRU moves there better and further. For
more information about SRU see:
http://www.loc.gov/z3950/agency/zing/srw/
--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame
(574) 631-8604
More information about the Web4lib
mailing list