Web Search Engines "Made Simple"
Jian Liu
jiliu at script.lib.indiana.edu
Thu Nov 6 18:43:55 EST 1997
In other words, there is no way to search for "date rape"?
Jian
>
>
> >>This illustrates the basic problem; each engine operates under its own
> >>semi-concealed rules; the rules have to be semi-concealed to prevent
> >>spammers from hijacking the engine.
> >> I actually *did* get an answer from Hotbot a few months ago, to a very
> >>similar query "Roman sites". The concealed rule is that one of the words is
> >>reserved: in my case "sites", in yours almost certainly "date" (I should
> >>hope!).
> >
> >"date" is indeed a stopword in HotBot. the way to test this when you get
> >squirrely hits is to type in the suspect term by itself. if it is a
> >stopword, it will yield no results. a subsequent search on "date rape" as
> >an exact phrase yielded the same number of hits as "rape" by itself
> >(stopwords are wildcarded in an exact phrase). if you take a look at the
> >breakdown of individual pagecounts, you'll notice that "date" occurs over
> >11 million times in our database, which definitely makes it a stopword,
> >since searching for it would significantly slow down retrieval time.
> >
> >while we do not have a printed list of stopwords (it is dynamic and changes
> >with each crawl), we do have in our FAQ an explanation of how we index and
> >retrieve pages:
> >
> >http://help.hotbot.com/faq/score.html
> >
> >hope this clears up some of the mystery!
> >
> >- judy
>
>
> ___________________________________
>
> j. y. chen | hotbot tutor | WIRED d i g i t a l
> (v) 415. 276 .8464 | (f) 415. 276. 8499
> http://www.hotbot.com
>
> The beatings will continue until morale improves!
>
>
>
>
More information about the Web4lib
mailing list