Full-text searching of several sites (Question)

Dave Lewis drlewi1 at srv.PacBell.COM
Wed Apr 10 14:29:24 EDT 1996


>Has anyone attempted to set up full-text searching of about 
>20 or 25 sites, none of which are search engines themselves?
>These could be universities, businesses, orgs, what have you.

>I have someone who wants to be able to do full-text searching of 
>20 sites and ONLY those 20 sites retrieving relevant information
>all at once.  He'd like to set up something locally that will
>let him do the searching.  

As another mentioned before me in response to this question, Harvest
would be capable of performing in this role.  

The way this works is a Harvest "gatherer" would be setup with 
the list of root URLs for the sites to be indexed and then a 
host-filter.cf file would be setup to limit the "expansion" of the
search to only resources located on those sites.  A "url-filter.cf"
file can be used to exclude certain resources on these sites by
path name pattern match.  This can be used to exclude cgi-bin,
stat files, etc.

Dave Lewis
drlewi1 at pacbell.com 


More information about the Web4lib mailing list