inconsistencies web search performance

Edward Wigg E-WIGG at EVANSTON.LIB.IL.US
Sat Nov 8 13:28:04 EST 1997


At 01:31 PM 11/7/97 -0800, "Karen G. Schneider" <kgs at bluehighways.com>
wrote:
>Very interesting thread.  As for the concept of controlled vocabulary, in
>its defense, a vocabulary that could teach itself would be extremely
>useful.  Imho, the problem with controlled vocabularies--what makes them
>idiosyncratic--is that they are not associated with an intelligent
>mechanism that can reason, as the human brain does, "water closet = toilet"
>or "smile is near grin," then pocket that information for further use. I
>realize there are some tools that associate patterns and groups of words,
>but I don't know of a tool that can make software independently build its
>vocabulary without assistance from a human.  Or perhaps there is, and I'm
>just behind the ball curve...
>

There is some AI research that has tried to do this sort of thing --  the
Cyc Project is probably the best example. It starts with a core of
assertions about "common knowledge" in a large knowledge base and uses
these to make inferences that can be applied in various ways. Take a look
at: http://www.cyc.com/overview.html

They give this as one of the examples on their applications page, which is
quite close to what you are asking for:

"A news agency may possess a library of thousands of news photos; a movie
studio
thousands of film clips; a software help desk thousands of text articles
too unwieldy to index directly. When such libraries must be searched, a
common solution is to attach to each item a short text caption describing
its contents. Thus a news photo of a soldier holding a gun to a woman's
head might be captioned "a soldier holding a gun to a woman's head", plus a
few tags for time and  place, and then could be retrieved by querying for
"soldier" or "gun".

"This solution, while certainly adequate, is far from ideal. It would be
nice if the photo could also be retrieved by queries for "someone in
danger", or "a frightened person", or "a man threatening a woman". Such an
achievement, however, lies far beyond the abilities of even the most
sophisticated of traditional text-searching tools, all of which are
fundamentally based on simple string
matching and synonyms. Most search tools lack the ability to handle
natural-language queries, and even those that do have some NL capability
lack the background of commonsense knowledge required to make a connection
between having a gun to one's head and feeling frightened.

"CYC® is not crippled by such a liability. CYC® knows that guns shoot
bullets and are designed to kill people; that having a gun to one's head
therefore threatens one's life; that those whose lives are threatened feel
fear; and that the vast majority of soldiers are men. CYC® can therefore
conclude that the image in question is, in all likelihood, a good match for
each of the queries above."

Edward

--------------------------------------------------------------
Edward Wigg                      "Just another guy, you know?"
Evanston Public Library             e-wigg at evanston.lib.il.us
Evanston, Illinois                  



More information about the Web4lib mailing list