[Web4lib] web site search engines

Mark Costa markrcosta at gmail.com
Fri Sep 23 08:20:01 EDT 2005


Ijust wanted to thank all of you for the prompt, detailed responses to my
question.

On 9/22/05, Tito Sierra <tito_sierra at ncsu.edu> wrote:
>
> Hi Karen,
>
> Although I am no Nutch or Lucene expert, I can relate some
> information based on my own experience as a Nutch user.
>
> Regarding the which search features are supported, you need to look
> at the Lucene docs for specific answers. Here is a very brief
> overview (with links to more info):
> http://lucene.apache.org/java/docs/features.html
>
> Many of your questions seem directed at search query options (e.g.
> boolean, wildcard, fielded search). This documentation reveals some
> of the query options supported:
> http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> Here is a book dedicated to Lucene:
> http://lucenebook.com/
>
> For what it's worth, I think the Lucene documentation is much farther
> along than the Nutch documentation.
>
> Nutch provides a web crawler front-end to the Lucene search libraries
> (where all the IR stuff happens). In Nutch you can configure all
> your crawler settings such as what URLs to crawl, how many links to
> follow, etc. Tinkering with relevance algorithms is possible at the
> Lucene level.
>
> I don't know of a zero-cost web interface for configuring Lucene-
> based apps. However, I have read that SearchBlox (http://
> www.searchblox.com/) <http://www.searchblox.com/)> offers a product that
> is a web based admin
> interface to Lucene.
>
> Hope this helps.
>
> Tito
>
> On Sep 22, 2005, at 2:08 PM, K.G. Schneider wrote:
>
> > Some of the questions I have when I evaluate search engines don't
> > seem to be
> > answered on the Nutch pages. Is there a features page and I'm just
> > missing
> > it?
> >
> > Questions I have include what kind of searches it supports (quoted,
> > nested,
> > truncation, wildcarding [and where], Boolean), whether stemming is
> > an option
> > and what it uses for stemming (and can you add exceptions/changes),
> > Boolean
> > operator support (can you use Google-like plus or minus or are you
> > stuck
> > with 1990s terms), weighted field searching, synonym support, what
> > kinds of
> > indexes it builds, multi-format indexing, incremental indexing,
> > spell-check
> > support, thesauri support, fielded searching, rank-by-reputation,
> > and a lot
> > more.
> >
> > I want to know how the search engine handles punctuation and special
> > characters (and what's configurable), document format support,
> > post-coordination options... well on and on. Then is how easy it is to
> > configure and how transparent is its configuration to a working
> > organization: does it require geeky command line stuff, or can a
> > knowledgable manager enter a web or software interface to view or
> > modify
> > settings?
> >
> > How about result sorting? Deduping? Tinkering with relevance
> > algoritms?
> > Ranking overrides? Etc.
> >
> > I've evaluated Google client, and I too have a deal-breaking
> > problem with it
> > being "secret sauce." I also note that many of its capabilities are
> > not
> > switches. If Google doesn't believe in stemming, you don't get
> > stemming as
> > an option. I believe that's how it is configured at present. In
> > metadata-reliant databases, that's a killer. Basically it's
> > designed for
> > organizations that aren't that interested in search and just want a
> > reasonably good product so they can go back to selling socks or
> > whatever.
> >
> > But we're librarians. I search, therefore I am. ;)
> >
> > Karen G. Schneider
> > kgs at bluehighways.com
> >
> > _______________________________________________
> > Web4lib mailing list
> > Web4lib at webjunction.org
> > http://lists.webjunction.org/web4lib/
> >
>
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
>



--
Mark R. Costa
Off-Campus Librarian, Eastern Region
Central Michigan University


More information about the Web4lib mailing list