Web Search Engines "Made Simple"

j. y. c h e n judy at hotwired.com
Thu Nov 6 17:05:39 EST 1997


>>This illustrates the basic problem; each engine operates under its own
>>semi-concealed rules; the rules have to be semi-concealed to prevent
>>spammers from hijacking the engine.
>>  I actually *did* get an answer from Hotbot a few months ago, to a very
>>similar query "Roman sites". The concealed rule is that one of the words is
>>reserved: in my case "sites", in yours almost certainly "date" (I should
>>hope!).
>
>"date" is indeed a stopword in HotBot.  the way to test this when you get
>squirrely hits is to type in the suspect term by itself.  if it is a
>stopword, it will yield no results.  a subsequent search on "date rape" as
>an exact phrase yielded the same number of hits as "rape" by itself
>(stopwords are wildcarded in an exact phrase).  if you take a look at the
>breakdown of individual pagecounts, you'll notice that "date" occurs over
>11 million times in our database, which definitely makes it a stopword,
>since searching for it would significantly slow down retrieval time.
>
>while we do not have a printed list of stopwords (it is dynamic and changes
>with each crawl), we do have in our FAQ an explanation of how we index and
>retrieve pages:
>
>http://help.hotbot.com/faq/score.html
>
>hope this clears up some of the mystery!
>
>- judy


___________________________________

j. y.  chen | hotbot  tutor | WIRED  d i g i t a l
(v) 415. 276 .8464  | (f) 415. 276. 8499
	http://www.hotbot.com

The beatings will continue until morale improves!






More information about the Web4lib mailing list