Full-text searching of several sites (Question)
Dave Lewis
drlewi1 at srv.PacBell.COM
Wed Apr 10 14:29:24 EDT 1996
>Has anyone attempted to set up full-text searching of about
>20 or 25 sites, none of which are search engines themselves?
>These could be universities, businesses, orgs, what have you.
>I have someone who wants to be able to do full-text searching of
>20 sites and ONLY those 20 sites retrieving relevant information
>all at once. He'd like to set up something locally that will
>let him do the searching.
As another mentioned before me in response to this question, Harvest
would be capable of performing in this role.
The way this works is a Harvest "gatherer" would be setup with
the list of root URLs for the sites to be indexed and then a
host-filter.cf file would be setup to limit the "expansion" of the
search to only resources located on those sites. A "url-filter.cf"
file can be used to exclude certain resources on these sites by
path name pattern match. This can be used to exclude cgi-bin,
stat files, etc.
Dave Lewis
drlewi1 at pacbell.com
More information about the Web4lib
mailing list