[Web4lib] Why use the plus symbol before search terms when
googling?
Patricia F Anderson
pfa at umich.edu
Wed Sep 7 18:56:50 EDT 2005
Hi, Jeremy and David,
Google's normal searching includes stemming. The tilde does do synonym
searching (a.k.a. thesaurus searching). There is nothing to indicate that
Google gives up stemming when doing thesaurus searching, and I rather
think it makes sense that they include both.
Regarding tilde searching, Google does not say what thesaurus they are
using for the synonyms searched, but they do highlight the relevant terms
in the search results. In theory, you could retroactively derive what the
synonyms were. I have observed simple stemming for the synonym terms, such
as plurals. The synonyms are not always what you might expect, and the
results seem to be ranked irrespective of the term found -- whether it was
the one you entered or one of the synonyms. The usefulness of the tilde is
greater with common words than technical or jargon terms.
For some interesting examples of the tilde working as billed, try:
child
~child
cancer
~cancer
For an odd example of the tilde *not* working as expected, try using it
with related color terms, such as ~violet and ~purple (which you might
think were similar, but aren't, at least according to the search results).
Speaking of empiricism, some years ago I noticed that the use of the plus
sign (+) with a term resulted in a shift in the ranking of the results,
with the plussed term being ranked more highly in the results. This is no
longer true, but was something I really really liked, and I wish it would
come back or that search engines would offer this functionality in some
other way. I know, I know -- term ordering, most important term first.
Still, if you've already done that and aren't happy with the results
ranking, it would be nice to give the most important concept an extra
boost.
Jeremy, thanks for the tip about Bloom filters. I hadn't heard about
those, and it is very interesting!
-- Patricia Anderson, pfa at umich.edu
>> 2. It is very difficult to figure out exactly what Google does
>> because it is poorly documented and works inconsistently.
>
> Since it's so difficult, people tend to go for empiricism. I have
> observed, through personal trial and error, that using a tilde does do
> stemming. As for the number of results you're citing, they're wildly
> inaccurate. They're "estimates", and AFAIK they are based on bloom
> filters.
> http://en.wikipedia.org/wiki/Bloom_filter
More information about the Web4lib
mailing list