Web Search Engines "Made Simple"
j. y. c h e n
judy at hotwired.com
Thu Nov 6 17:05:39 EST 1997
>>This illustrates the basic problem; each engine operates under its own
>>semi-concealed rules; the rules have to be semi-concealed to prevent
>>spammers from hijacking the engine.
>> I actually *did* get an answer from Hotbot a few months ago, to a very
>>similar query "Roman sites". The concealed rule is that one of the words is
>>reserved: in my case "sites", in yours almost certainly "date" (I should
>>hope!).
>
>"date" is indeed a stopword in HotBot. the way to test this when you get
>squirrely hits is to type in the suspect term by itself. if it is a
>stopword, it will yield no results. a subsequent search on "date rape" as
>an exact phrase yielded the same number of hits as "rape" by itself
>(stopwords are wildcarded in an exact phrase). if you take a look at the
>breakdown of individual pagecounts, you'll notice that "date" occurs over
>11 million times in our database, which definitely makes it a stopword,
>since searching for it would significantly slow down retrieval time.
>
>while we do not have a printed list of stopwords (it is dynamic and changes
>with each crawl), we do have in our FAQ an explanation of how we index and
>retrieve pages:
>
>http://help.hotbot.com/faq/score.html
>
>hope this clears up some of the mystery!
>
>- judy
___________________________________
j. y. chen | hotbot tutor | WIRED d i g i t a l
(v) 415. 276 .8464 | (f) 415. 276. 8499
http://www.hotbot.com
The beatings will continue until morale improves!
More information about the Web4lib
mailing list