[Web4lib] Federated Search versus Crawler or Spider

Danielle Plumer dplumer at tsl.state.tx.us
Tue Jul 3 10:12:55 EDT 2007


I've been working on a federated search project here in Texas (www.texasheritageonline.org), and the one lesson I think we've learned is that if you rely on a single protocol you're not going to get very far. I should note that my project is mostly looking at freely available materials, although a sister site, www.libraryoftexas.org, does federated search across licensed content.

We have sites that support OAI-PMH (with remarkably little standardization of metadata), sites on Z39.50, sites using SRU. We have large legacy databases for which we're developing procedures to add one of the above protocols (SRU particularly is open to community development, BTW). And I hope in the future to add support for indexing web-harvested material, possibly using OpenSearch (another relatively easy to implement option). 

So, I'd say the onus is on the federated search developer to support multiple protocols and do normalization of widely heterogeneous results sets. I don't know that we'll ever get to a point where I can say that *every* resource is included, but I do think we can include everything that Google and Yahoo! can get to, and then some.

Danielle Cunniff Plumer, Coordinator
Texas Heritage Digitization Initiative
Texas State Library and Archives Commission
512.463.5852 (phone) / 512.936.2306 (fax)
dplumer at tsl.state.tx.us

-----Original Message-----
From: web4lib-bounces at webjunction.org
[mailto:web4lib-bounces at webjunction.org]On Behalf Of Andrew Ashton
Sent: Tuesday, July 03, 2007 7:59 AM
To: Ross Singer; McIntyre, Ruth
Cc: web4lib at webjunction.org
Subject: RE: [Web4lib] Federated Search versus Crawler or Spider


 Ross Singer wrote:
"2) the onus is on the content providers to provide a standardized
search interface - you lose all control about what is indexed/how it's
indexed and how search results are presented"

This seems to be the kiss of death for any really useful Federated
search projects.  Remember the 
OCLC SiteSearch project?  We did a proof-of-concept test here at
Skidmore College and discovered quickly that the lack of vendor
standardization made it impractical.  Librarians are nothing if not
completists, and offering a Federated Search service that only covered
2/3 of our resources wasn't going to cut it.  Sure, you could buy a
Federated Search product and let the vendor worry about maintaining
access to the non-standardized targets, but any technology that
precludes community development ought to be considered dead in the
water.

--
Andrew Ashton
Systems Librarian
Scribner Library, Skidmore College
(518)580-5505




_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/


More information about the Web4lib mailing list