inconsistencies web search performance
Nick Arnett
narnett at verity.com
Mon Nov 10 10:39:03 EST 1997
At 02:09 AM 11/8/97 -0800, Leonard Will wrote:
>Rather than relying on machines to group terms on the basis of chance
>co-occurrences, we should be developing a good thesaurus (or its big
>brother, a semantic net). Search software should then either use this
>automatically to find other terms to include in a search or should allow
>a user to interact with it at search time, by asking questions such as:
>"There's not much in the database on 'water closets'; would you like me
>to search for 'toilets' too, or should I include all sorts of bathroom
>fittings?"
The Cyc project has been working on the semantic network of all things, but
that's a rather large task... ;-) Maintenance is a huge issue; the number
of new terms entering the language is astonishing -- new product names,
buzzwords, etc. Automatically constructed semantic nets, like thesauruses,
err heavily on the side of finding too much.
But I think your suggestion of responding with choices about scope and
context are right on the mark. You mention scope, but not context.
Contextual disambiguation is one of the really hard problems in retrieval.
A word such as "bank" can have many meanings; I'm not aware of any software
that disambiguates such terms effectively. However, if you can return
results that are sorted into useful categories, the user can then disambiguate.
Nick
Product Manager, Knowledge Applications
Verity Inc. -- Connecting People with Information
Phone: 408-542-2164 Fax: 408-541-1600
Home office: 408-733-7613 narnett at verity.com
http://www.verity.com
More information about the Web4lib
mailing list