Internal Revenus Service/AltaVista

Nick Arnett narnett at verity.com
Wed Apr 23 12:00:45 EDT 1997


At 07:35 AM 4/23/97 -0700, Steve Harter wrote:

>Move your mouse cursor over the terms in the graph for a full view of the
>hierarchies created.  This is definitely worth a look.  The initial query
>is easily modified by clicking on terms to be added.

The related terms are interesting, but they also uncover the weakness of
search software that can't accrue evidence.  Add a bunch of terms with
LiveTopics and you'll quickly get down to zero documents, since it's still
Boolean underneath.  What you immediately want when you have related terms
like this is an operator that says "the more of these, the better" instead
of "all or nothing".  I don't see any way to do the former with AltaVista.

This is closely related to the fact that you see the same stem repeated in
the related terms.  Look under "withholding" and you'll see
"withholding,withhold, withheld," for example.  Most engines automatically
stem this way so that you don't have to enter each variation.  And what
you'd really want to do with them is an "OR," which you have to manually
edit into the query.  Here's the result -- search '+"Internal Revenue
Service" +withholding' and you get 2,000 documents; add the stems and you
get 400 documents.  That's exactly the opposite of what should happen when
you stem -- you should find all of the variations, not the Boolean "AND" of
the stems.

One more limitation -- there are no phrases in the "topics."  They are all
single words.  This seriously limits the topics'  expressiveness.
Linguistic analysis would uncover noun phrases such as "employer
withholding," which would be a more focused search, closer to what people
ordinarily think of as topics.

Again, I feel I should say that we have a healthy respect for the size of
AltaVista.  That's no mean feat.  But the search capabilities are far
inferior to commercially available search software.  Part of my motivation
in writing this is to counter the notion that AltaVista represents
state-of-the-art search, even though represents state-of-the-art size.

Nick

---------------------------------------
Verity Inc. -- Connecting People with Information

Product Manager, Categorization and Visualization
408-542-2164; fax 408-541-1600; home office 408-733-7613
http://www.verity.com/

Verity Inc.
894 Ross Drive
Sunnyvale, CA 94089



More information about the Web4lib mailing list