[Web4lib] Faceted navigation as metasearch

Peter Noerr pnoerr at MuseGlobal.com
Thu Jan 4 16:16:40 EST 2007


Along with the other metasearch systems which have been mentioned in
other emails (MetaLib from Ex Libris, Central Search from Serial
Solutions - both of which cluster using Vivisimo's Clusty engine), we
(Muse from MuseGlobal) have added what we call Content Mining (blame the
need for distinguishable product names, even though the overall
functionality is very similar) to our metasearch engine. It will be
released with different trade names by our partners starting at
Mid-winter ALA. That's the answer to the "Anyone know..." question.

On the issue of response times raised variously. All the metasearch
engines must wait for results and then process on the fly, so they will
be slower than a faceted database system which has done all the work
beforehand. However they will work on the material you have retrieved,
from wherever, not just what is in the database on a just-in-case basis.
I've just run a search on one of our systems - running on a medium sized
PC. (The operational system run on much bigger hardware and often run
hundreds of simultaneous sessions.)  From 18 Web search engines we get
about 160 records. The first of these are displayed and usable in about
3 seconds along with the top 5 extracted terms, The list then grows up
to about 20 seconds - by which time it is well off the first page, and
we are waiting for timeouts to come into play. Against slower sources
(and, unfortunately, that includes most library catalogs) the times get
longer. Our tests show that we generally add less than 0.5 seconds to a
response - about the native response time of the GYM search engines. Is
that too slow for what you get?

About what you get. A point nobody raised is that metasearch engines
(Central Search, Metalib, Muse, etc.) and analysis engines (like
Vivisimo and Grokker) and faceted database systems (like Endeca and even
FAST) are all trying to do different things. 

Metasearch extends your search scope by going after ad hoc chosen
resources, and produces uniform results.
Analysis organises your results (wherever they came from) into more
meaningful groups so you can winnow the results more effectively.
Faceted databases create indexes (based on field content) of their
records and display those to narrow results more easily.

This is why the other metasearch engines have bought in analysis and we
are adding our own analysis functions to post process the results to
give the best of both worlds. Note that both Vivisimo and Grokker have
low level metasearch engines feeding them in their native form.

The faceted databases are completely different and actually use a
completely different meaning for 'faceted' - it is not the extraction of
terms from documents as the analysis engines do. The faceted databases
(as Josh Ferraro described) aggregate values from fields of their
records, and then display them to the users. So you can use a facet to
narrow the results after the search, or you can apply a limit to the
search in the first place, same functionality, different operation -
hopefully the same result.
 
Sorry this post is so long, there are a real mix of topics here.

Peter 
Dr Peter Noerr
CTO,MuseGlobal,Inc.
www.museglobal.com
+1 801 208 1880 


> I think that pulling in all (or even some) of the retrieved 
> records and creating the facets on the fly would be 
> prohibitively slow. Anyone know a system that can do this?
> 
> kc
> 
> K.G. Schneider wrote:
> > One idea I heard recently had to do with using faceted navigation 
> > search engines for metasearch. I've been mulling this over, 
> > particularly after rereading Karen Calhoun's "Changing 
> Nature of the Catalog"
> > (www.loc.gov/catdir/calhoun-report-final.pdf  ) and really, 
> I'm trying 
> > to figure out if there's much wrong with this idea.
> >
> > Consider using a tool (Endeca, Siderean, FAST, i411, 
> Dieselpoint-full
> > disclosure: we implemented Siderean at my FPOW) that lies above the 
> > catalog and all other discovery services and provides 
> access to ETDs, 
> > book data, journal articles, and more, ecumenically searched but 
> > parsed out logically in post-coordination. In fact, I'm 
> wondering if 
> > NCSU and other libraries now on the faceted-navigation 
> bandwagon are looking at moving in this direction.
> >
> > I do see one biggy. These search engines are themselves pretty 
> > lickety-split, but I wonder how slow retrieval would get if 
> they were 
> > accessing separate journal and database services (versus 
> what they do 
> > now, which is search their own pre-built indexes...). In 
> some cases, 
> > it might be easy to create an index (e.g. if we have ETDs, 
> I don't see 
> > why they can't be indexed). But journal articles? Pondering 
> again... 
> > though if they were accessing resources such as Google Scholar and 
> > then tying everything back in...
> 


More information about the Web4lib mailing list