[Web4lib] Why use the plus symbol before search terms when googling?

Patricia F Anderson pfa at umich.edu
Wed Sep 7 18:56:50 EDT 2005


Hi, Jeremy and David,

Google's normal searching includes stemming. The tilde does do synonym 
searching (a.k.a. thesaurus searching). There is nothing to indicate that 
Google gives up stemming when doing thesaurus searching, and I rather 
think it makes sense that they include both.

Regarding tilde searching, Google does not say what thesaurus they are 
using for the synonyms searched, but they do highlight the relevant terms 
in the search results. In theory, you could retroactively derive what the 
synonyms were. I have observed simple stemming for the synonym terms, such 
as plurals. The synonyms are not always what you might expect, and the 
results seem to be ranked irrespective of the term found -- whether it was 
the one you entered or one of the synonyms. The usefulness of the tilde is 
greater with common words than technical or jargon terms.

For some interesting examples of the tilde working as billed, try:

child
~child

cancer
~cancer

For an odd example of the tilde *not* working as expected, try using it 
with related color terms, such as ~violet and ~purple (which you might 
think were similar, but aren't, at least according to the search results).

Speaking of empiricism, some years ago I noticed that the use of the plus 
sign (+) with a term resulted in a shift in the ranking of the results, 
with the plussed term being ranked more highly in the results. This is no 
longer true, but was something I really really liked, and I wish it would 
come back or that search engines would offer this functionality in some 
other way. I know, I know -- term ordering, most important term first. 
Still, if you've already done that and aren't happy with the results 
ranking, it would be nice to give the most important concept an extra 
boost.

Jeremy, thanks for the tip about Bloom filters. I hadn't heard about 
those, and it is very interesting!

-- Patricia Anderson, pfa at umich.edu

>>          2.  It is very difficult to figure out exactly what Google does
>> because it is poorly documented and works inconsistently.
>
> Since it's so difficult, people tend to go for empiricism.  I have
> observed, through personal trial and error, that using a tilde does do
> stemming.  As for the number of results you're citing, they're wildly
> inaccurate.  They're "estimates", and AFAIK they are based on bloom
> filters.
> http://en.wikipedia.org/wiki/Bloom_filter



More information about the Web4lib mailing list