Agents (Was Re: Students use of search engines.)

Nick Arnett narnett at Verity.COM
Thu Jun 6 14:14:15 EDT 1996


At 9:43 AM 6/6/96, Paul Hollands wrote:

>I suppose it would be a sort of 'current awareness' engine. It would of
>course control synonyms and homonyms in an interactive way depending on what
>sources it was interogating. (I believe the Harvest system is able to do
>this, though I have a limited understanding of Harvesting...[Please feel
>free to jump in here anytime Jim'll. I'm getting out of my depth..])

Not really.  Harvest is an approach to distributed indexing.  It was
designed to accomodate multiple search engines, so it makes no assumptions
about the intelligence of the search.  Harvest didn't try to address the
problem of brokering searches to heterogeneous engines, which has many
sticky issues.

The chairs of the W3C workshop on distributed search, as well as at least
one of the key participants (from Netscape) are the people behind Harvest.
At last week's workshop in Cambridge, the group discussed the goal of
setting distributed search standards.  The problems start with figuring out
how to characterize a source so that a broker could reasonably decide where
to send a query.  There isn't a good known solution to that problem, though
new summarization techniques may offer some.  Then there's the query
language problem... and the problem of combining relevancy scores.  Getting
agreement on those three things will take time.

There were at least a couple of library people participating in the
workshop who are on this list (I don't count myself as a library person;
I'm a library user who's here to understand the librarians' points of
view).

>I suppose that the best approach would be to have existing robots producing
>the databases of inverted indexes as they do now [Alta Vistas & Inktomis of
>this world] and then let your own agent army lose on the databases rather
>than roaming the Net itself.
>
>Instead of having a browser perhaps you would just have a client to contact
>your broker.

You haven't added any efficiency if you use inverted indexes (even with the
full proximity, etc., information that they may contain) to choose sources;
that's what engines like ours do!  In theory, some sort of source summary
could be created that is more compact and much faster to use for deciding
where to route a query... but that remains purely theoretical.

>The clever bit is that it would monitor your communications when you surf
>the www/usenet and also your email and I suppose your IRC and MOO activities
>and interactively tweak you profile as your interests change. You would  use
>your 'broker' client for all of these services and it would filter the
>information as it came through.

Clearly, a lot of research could be done (and has been done) with regard to
passively watching a computer user's activities and attempting to derive
information about his or her interests.  The most striking data point so
far seems to be that computers have very little data to work with.  They're
blind and deaf, so there's a great deal of ambiguity in the bits that
travel back and forth.  The ambiguity tends to amplify recall (finding lots
of potentially interesting documents) while doing much less for precision
(finding only the interesting documents).

In any event, it's a fascinating area, but full of problems for which there
are no known solutions.  As usual with expert systems, AI, or whatever you
call this stuff (a cataloging issue...), even knowledgeable people tend to
assume that the state of the art is smarter than it really is.  Instead of
accomplishing our goals, we tend to redefine them as we continue to
comprehend more about human intelligence.

Nick




More information about the Web4lib mailing list