[Web4lib] Faceted navigation as metasearch

cpikas.14607360 at bloglines.com cpikas.14607360 at bloglines.com
Fri Jan 5 13:18:48 EST 2007


This is a very helpful explanation, Peter, thanks!

Some researchers at
HCIL at Maryland were looking a clustering and faceted presentation of web
search results.  They found that many of the automatic clusters formed are
not particularly informative or helpful.

Bill Kules did his dissertation
on faceted presentation of web search results using facets provided (I think?)by
the DMOZ directory listing.  He found that _meaningful_ and _stable_ categories
are more helpful/useful than the automatically formed clusters.  See:  http://www.cs.umd.edu/hcil/categorizedsearch/


Although the work was partially funded by AOL, I don't know of any web
search engine employing these techniques.

Christina Pikas

--- Peter
Noerr" <pnoerr at MuseGlobal.com wrote:
Along with the other metasearch systems
which have been mentioned in
> other emails (MetaLib from Ex Libris, Central
Search from Serial
> Solutions - both of which cluster using Vivisimo's Clusty
engine), we
> (Muse from MuseGlobal) have added what we call Content Mining
(blame the
> need for distinguishable product names, even though the overall

> functionality is very similar) to our metasearch engine. It will be
>
released with different trade names by our partners starting at
> Mid-winter
ALA. That's the answer to the "Anyone know..." question.
> 
> On the issue
of response times raised variously. All the metasearch
> engines must wait
for results and then process on the fly, so they will
> be slower than a
faceted database system which has done all the work
> beforehand. However
they will work on the material you have retrieved,
> from wherever, not just
what is in the database on a just-in-case basis.
> I've just run a search
on one of our systems - running on a medium sized
> PC. (The operational
system run on much bigger hardware and often run
> hundreds of simultaneous
sessions.)  From 18 Web search engines we get
> about 160 records. The first
of these are displayed and usable in about
> 3 seconds along with the top
5 extracted terms, The list then grows up
> to about 20 seconds - by which
time it is well off the first page, and
> we are waiting for timeouts to
come into play. Against slower sources
> (and, unfortunately, that includes
most library catalogs) the times get
> longer. Our tests show that we generally
add less than 0.5 seconds to a
> response - about the native response time
of the GYM search engines. Is
> that too slow for what you get?
> 
> About
what you get. A point nobody raised is that metasearch engines
> (Central
Search, Metalib, Muse, etc.) and analysis engines (like
> Vivisimo and Grokker)
and faceted database systems (like Endeca and even
> FAST) are all trying
to do different things. 
> 
> Metasearch extends your search scope by going
after ad hoc chosen
> resources, and produces uniform results.
> Analysis
organises your results (wherever they came from) into more
> meaningful groups
so you can winnow the results more effectively.
> Faceted databases create
indexes (based on field content) of their
> records and display those to
narrow results more easily.
> 
> This is why the other metasearch engines
have bought in analysis and we
> are adding our own analysis functions to
post process the results to
> give the best of both worlds. Note that both
Vivisimo and Grokker have
> low level metasearch engines feeding them in
their native form.
> 
> The faceted databases are completely different and
actually use a
> completely different meaning for 'faceted' - it is not the
extraction of
> terms from documents as the analysis engines do. The faceted
databases
> (as Josh Ferraro described) aggregate values from fields of their

> records, and then display them to the users. So you can use a facet to

> narrow the results after the search, or you can apply a limit to the
>
search in the first place, same functionality, different operation -
> hopefully
the same result.
>  
> Sorry this post is so long, there are a real mix
of topics here.
> 
> Peter 
> Dr Peter Noerr
> CTO,MuseGlobal,Inc.
>
www.museglobal.com
> +1 801 208 1880 
> 



More information about the Web4lib mailing list