Site Search Tool

Christian Pietsch chr.pietsch+web4lib at GOOGLEMAIL.COM
Wed May 14 03:27:36 EDT 2014


Dear Heather,

most search solutions are based on Apache Lucene. I agree with Mark
that Apache Nutch is an obvious choice for crawling, but you probably
do not need it given that all your files are local.

These days, most programmers do not use Lucene directly but use Solr
or Elasticsearch as an intermediate layer. Wikipedia and its sister
projects just switched from pure Lucene to Elasticsearch. Here is a
short summary of their reasons to prefer Elasticsearch over Solr:
https://www.mediawiki.org/wiki/Search#Solr_vs_Elasticsearch
I would add that replication and scaling is easier with Elasticsearch.

For your hand-coded website, this could all be overkill. Ease of
setup, integration, and maintainance might outweight scalability for
you. So in addition to Elasticsearch, I would recommend looking at
Omega, which is based on Xapian <http://xapian.org/docs/omega/>,
Sphider (if you can run PHP and MySQL), or Yacy (if you can run Java).
For really small websites there are client-side jQuery search plugins.

Web Content Management Systems such as Drupal come with a built-in
search facility. Whatever prevented you from using a CMS might also
prevent you from setting up your own site search service I'm afraid.

I hope this helps.
Christian

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec · Library Technology and Knowledge Management
  Bielefeld University Library, Bielefeld, Germany

============================

To unsubscribe: http://bit.ly/web4lib

Web4Lib Web Site: http://web4lib.org/

2014-05-14



More information about the Web4lib mailing list