[Web4lib] web site search engines
K.G. Schneider
kgs at bluehighways.com
Thu Sep 22 14:08:18 EDT 2005
Some of the questions I have when I evaluate search engines don't seem to be
answered on the Nutch pages. Is there a features page and I'm just missing
it?
Questions I have include what kind of searches it supports (quoted, nested,
truncation, wildcarding [and where], Boolean), whether stemming is an option
and what it uses for stemming (and can you add exceptions/changes), Boolean
operator support (can you use Google-like plus or minus or are you stuck
with 1990s terms), weighted field searching, synonym support, what kinds of
indexes it builds, multi-format indexing, incremental indexing, spell-check
support, thesauri support, fielded searching, rank-by-reputation, and a lot
more.
I want to know how the search engine handles punctuation and special
characters (and what's configurable), document format support,
post-coordination options... well on and on. Then is how easy it is to
configure and how transparent is its configuration to a working
organization: does it require geeky command line stuff, or can a
knowledgable manager enter a web or software interface to view or modify
settings?
How about result sorting? Deduping? Tinkering with relevance algoritms?
Ranking overrides? Etc.
I've evaluated Google client, and I too have a deal-breaking problem with it
being "secret sauce." I also note that many of its capabilities are not
switches. If Google doesn't believe in stemming, you don't get stemming as
an option. I believe that's how it is configured at present. In
metadata-reliant databases, that's a killer. Basically it's designed for
organizations that aren't that interested in search and just want a
reasonably good product so they can go back to selling socks or whatever.
But we're librarians. I search, therefore I am. ;)
Karen G. Schneider
kgs at bluehighways.com
More information about the Web4lib
mailing list