Site Search Tool

Steffen Schilke steffen.schilke at GMAIL.COM
Mon May 12 15:26:19 EDT 2014


Hallo,

I did a similar thing but as we developed our own site we used plain Lucene
for the crawling and search buy building scheduled task for indexing and a
search interface. If you have an own page editor it makes sense to crawl a
page when you publish it (not save the act of setting it free into the wild
;-) If you delete pages you should remove them from the index in the delete
function. Once in a while some maintance is necessary and a rebuild of the
index is called for.

Kind regards

sws


On Mon, May 12, 2014 at 5:56 PM, Mark Vega <vegamf at uci.edu> wrote:

> We are in the process of dropping our Google CSEs and attempting to build
> our own search tool using the free, open-source Apache Nutch and SOLR
> modules (Nutch for crawling and SOLR for indexing crawled content)
> overlayed with a PHP search interface.  We are using them to crawl and
> index all of our websites and databases and provide a unified search across
> all sources from a single search box.  We've only just started our first
> public BETA and I expect to be making adjustments for at least the next 6
> months to a year in order to get the search tool we want, but we we've been
> dissatisfied with the Google CSE for some time and were not willing to pay
> for the Google Search Appliance. Once you learn how to configure and use
> these two modules, they are a powerful combination but be advised that,
> although the basics are pretty simple, there is an extremely high learning
> curve to tweak the crawling, indexing and searching to get the results
> exactly as you want and as your users expe!
>  ct.
> --
> Mark Vega
> Programmer/Analyst
> University of California, Irvine Libraries - Web Services
> --
>
> ============================
>
> To unsubscribe: http://bit.ly/web4lib
>
> Web4Lib Web Site: http://web4lib.org/
>
> 2014-05-12
>

============================

To unsubscribe: http://bit.ly/web4lib

Web4Lib Web Site: http://web4lib.org/

2014-05-12
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.nd.edu/pipermail/web4lib/attachments/20140512/2d620c33/attachment.htm>


More information about the Web4lib mailing list