inconsistencies web search performance

Edward Wigg E-WIGG at EVANSTON.LIB.IL.US
Sat Nov 15 14:27:30 EST 1997


At 01:37 PM 11/14/97 -0800, Steve Harter <harter at indiana.edu> wrote in
response to Nick Arnett <narnett at verity.com>:
>> The bad ones are exact because they only do Boolean searching; the good
ones
>> are inexact because they use fuzzy logic.  The smarter the search engines
>> get, the less exact the relevancy will be.  This is the nature of
>> information.  Ask the editor of a newspaper why a particular news article
>> belongs on the front page (i.e., is considered highly relevant to the
>> paper's target audience) and you won't get a set of logical rules;
relevancy
>> ranking is a matter of opinion.  Fundamentally, subject-based searching of
>> text, when done well, is subjective.  Further, relevancy should have to do
>> with much more than the subject.  Sometimes a document is relevant because
>> it is well-written, authoritative, provocative, humorous, popular or has
>> other qualities that have nothing to do with the subject.  Only the last of
>> those -- popular -- is likely to be measurable by automation.
>> 
>
>There is a large theoretical and empirical literature in information
>science on the nature of relevance.  In recent years there has been much
>research in user-based relevance (also called psychological, cognitive, or
>situational relevance), in which the kinds of criteria described by Nick,
>among many others, have been found to describe real people in real
>situations....

What this skirts, without explicit stating it, is the idea that relevance
is subjective and subject to change, differing between people and for an
individual on different occasions. The frequency and pattern of the
occurrence of search terms within documents may indeed be one of the
factors that helps determine relevance, but a search engine that merely
does this one form of ranking, without explicitly stating it, then claims
in grand terms that the results are "ranked by relevance" is being
unhelpful or misleading.

It is possible to conceive of gathering information from a searcher to do
some more sophisticated form of relevancy ranking, either by tracking which
links are followed or by explicitly asking for input from the user. This
might currently be impossibly cumbersome for web search engines, but what
concerns me even more  the privacy issue: to do a good job the engine would
need to know quite a lot about you that would be valuable information to
many gatherers of personal information. A user conducted search may be more
convenient than a reference interview followed by a search helped by a
reference librarian, but at least you can be fairly certain that the
reference librarian is not selling information about your preference in dog
food to Ralston Purina!

Edward.


More information about the Web4lib mailing list