*implementing* search engines for a campus

Michael Alan Dorman mdorman at caldmed.med.miami.edu
Wed Dec 20 11:19:37 EST 1995


I'll just mention quickly that we installed a search engine (swish) for
our web server --- and placed the query form for it on the front page,
which I think is an important move --- on November 7th.  Just a touch over
six weeks, in other words. 

In looking that our statistics over that time, we have found that the cgi
(wwwwais) for doing the search is our second most requested document, with
roughly 3/5ths as many hits as the front page

It has, in turn, about 3 times as many hits as the third most requested
document. 

Our experience certainly seems to justify the decision to create the
index. 

Now, on to privacy...

On Wed, 20 Dec 1995, Prentiss Riddle wrote:
> A bigger problem for us is access control and the prevention of
> "leakage".  We have many documents which are not for export beyond the
> boundaries of our campus, and my users have let me know that they are
> not willing to tolerate a search engine which leaks even titles or
> excerpted lines in violation of access control rules.

Hmm.  If those documents are linked to by publicly accessible documents,
that's quite a requirement they've got there.  Of course, if they aren't,
then it's a non-issue

Anyway, it would seem to me that Harvest might provide an easier solution. 
Something like this: 

Each web server runs a gatherer. 

Each gatherer only indexes information appropriate for outside
consumption.  Presumably this is putting responsibility for choosing what 
to index in the hands of the people who are making that determination.

Your primary web server runs a broker in addition to its own gatherer. 

This broker presents a union of the data indexed by each of the remote
gatherers. 

As I understand the capabilities of Harvest, this is the most efficient
and therefore recommended configuration. 

Further, you might be able to have multiple gatherer datasets on each
server, one of publicly accessible material and one of restricted
material, and you could set up another broker on a restricted server that
could allow your internal users access to an index of your entire web. 

Mike.
--
Michael Alan Dorman                                   Head of Systems
mdorman at caldmed.med.miami.edu           Louis Calder Memorial Library
(305) 243-5530                     University of Miami Medical School



More information about the Web4lib mailing list