[Web4lib] web site search engines

K.G. Schneider kgs at bluehighways.com
Thu Sep 22 14:08:18 EDT 2005


Some of the questions I have when I evaluate search engines don't seem to be
answered on the Nutch pages. Is there a features page and I'm just missing
it? 

Questions I have include what kind of searches it supports (quoted, nested,
truncation, wildcarding [and where], Boolean), whether stemming is an option
and what it uses for stemming (and can you add exceptions/changes), Boolean
operator support (can you use Google-like plus or minus or are you stuck
with 1990s terms), weighted field searching, synonym support, what kinds of
indexes it builds, multi-format indexing, incremental indexing, spell-check
support, thesauri support, fielded searching, rank-by-reputation, and a lot
more. 

I want to know how the search engine handles punctuation and special
characters (and what's configurable), document format support,
post-coordination options... well on and on. Then is how easy it is to
configure and how transparent is its configuration to a working
organization: does it require geeky command line stuff, or can a
knowledgable manager enter a web or software interface to view or modify
settings? 

How about result sorting? Deduping? Tinkering with relevance algoritms?
Ranking overrides? Etc.

I've evaluated Google client, and I too have a deal-breaking problem with it
being "secret sauce." I also note that many of its capabilities are not
switches. If Google doesn't believe in stemming, you don't get stemming as
an option. I believe that's how it is configured at present. In
metadata-reliant databases, that's a killer. Basically it's designed for
organizations that aren't that interested in search and just want a
reasonably good product so they can go back to selling socks or whatever.

But we're librarians. I search, therefore I am. ;) 

Karen G. Schneider
kgs at bluehighways.com



More information about the Web4lib mailing list