[Web4lib] Re: OpenSearch, Koha, and Library Catalogs [SRW/U]

Wed Jul 27 08:18:49 EDT 2005

On Jul 25, 2005, at 8:42 AM, Ross Singer wrote:

> OpenSearch is much lighter than that and would actually work better  
> in a lot of arenas, such as providing an interface to courseware/ 
> campus portals and other places where you need to provide a simple  
> interface to your collections....

I will try to describe in a bit more detail what I see as the  
advantages of Search/Retrieve via URL (SRU).

A Web Service

SRU is a standardized search protocol in the shape of a Web Service.  
In other words, one computer sends another computer a specifically  
shaped URL. The second computer uses the URL as input, does some  
processing, and returns a standardized stream of XML. It us up the  
the first computer to then transform the resulting XML into something  
else such as an HTML page, an email message, an RSS feed, or maybe  
even an SQL query used to update a database. In this way, SRU is in  
no different from OpenSearch. There are many examples of SRU  
interfaces to indexes. I like to point to my Ockham Alerting services  
as a case in point:

   http://alert.ockham.org/

SRU "verbs"

Like OAI-PMH, another protocol in the form of a Web Service, the  
URL's of SRU requests are rather simple. Where OAI has six "verbs",  
SRU only needs three. They are explain, scan, and searchRetrieve.  
Explain is used to learn about the underlying index. Name,  
administrator, size, metadata supported, updating frequency, etc. It  
is akin to the identify verb in OAI. The scan process is similar to  
browsing the back-of-a-book index. Look at an item in the back-of-a- 
book index, get an idea of how many documents are associated with the  
term, and get an idea of what terms surround the term. searchRetieve  
is the heart of the matter. It provides methods for querying the  
underlying index. For librarians, the great thing about  
searchRetrieve is the expressiveness of the query language -- CQL.  
The query language allows for every type of Boolean operation,  
nesting, field, proximity, and wild card searching. If I remember  
correctly, OpenSearch only provides the means for simple term and/or  
phrase searching.

SRU results

Where OpenSearch returns something looking a whole lot like RSS, the  
results of SRU queries are a whole lot more flexible. Any combination  
of namespaces and be included in the results allowing for any  
combination of metadata elements, more than the RSS-like results of  
OpenSearch. Because of this flexibility it is possible to encode more  
than just title, URL, description, and keywords. If you wanted to  
encode bibliographic and holdings data, then that is entirely  
possible. If you wanted to return e-learning information such as  
courseware, then that is possible too. Other options include just  
about anything you can think of. GIS. Dates. Names and addresses.  
Pointers to images and music. The specifications for building a  
house. Etc.

SRU clients

Creating an SRU client is not as hard as it might seem to be. All you  
really need is an HTML form and a browser that understands CSS and/or  
XSTL. Here is a simple client:

   <form action='sru-server.cgi' method='GET'>
    <input name='operation'  type='hidden' value='searchRetrieve'  />
    <input name='version'    type='hidden' value='1.1'             />
    <input name='stylesheet' type='hidden' value='sru-results.css' />
    <input name='query'      type='text'   value='dogs and cats'   />
   </form>

This might result in an XML stream looking something like this:

   <? xml-stylesheet type='text/css' href='sru-results.css' ?>
   <searchRetrieveResponse>
    <version>1.1</version>
    <numberOfRecords>1</numberOfRecords>
    <records>
     <record>
      <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema>
      <recordPacking>xml</recordPacking>
      <recordData>
       <dc>
        <title>The bottom dog</title>
        <identifier>http://example.org/bottom.html</identifier>
       </dc>
      </recordData>
     </record>
    </records>
   </searchRetrieveResponse>

The browser would then use the local CSS named to sru-results.css to  
render the XML into something more readable for the user.

SRU and OpenSearch

Yes, at first glance it would seem that OpenSearch is easier to  
implement, but such an implementation will be limiting. You do not  
get an expressive query language with OpenSearch, nor do you get very  
expressive search results. OpenSearch is an industry standard, sort  
of. SRU is a community standard. It builds on the experience of the  
Z39.50 community while losing Z39.50's complexity. Like OpenSearch,  
SRU does not dictate the underlying search engine which could be  
direct access to a relational database, an index created from swish-e  
or plucene, or even a simple grep applied against a text file. There  
are many institutions implementing SRU including the Library of  
Congress, OCLC, IndexData, etc. There are many tools written in many  
languages used to implement SRU including things in Perl, Python, C,  
Java, etc.

Conclusion

I sincerely believe SRU implementations in libraries used to  
facilitate search (and remember, librarians love to search and  
everybody else like to find) will be much more scalable and  
functional than implementations using OpenSearch. While OpenSearch  
moves in the right direction, SRU moves there better and further. For  
more information about SRU see:

   http://www.loc.gov/z3950/agency/zing/srw/

-- 
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame

(574) 631-8604