[Web4lib] Another Google question
Richard Wiggins
richard.wiggins at gmail.com
Wed Jul 6 12:16:42 EDT 2005
And that one thing that Google does well is, in most cases, put SOMETHING
apparently useful at the top of the hit list. They are not trying to be
comprehensive, or exhaustive. That's why Google has a market capitalization
of $85 billion right now -- several times more than GM. There's something
useful at the top of the hit list. The percent of people who crawl through
hundreds of hit list entries is very, very low.
There are many things that Google does at the tail end of the hit list that
don't make sense to trained searchers -- folks looking for obscure facts at
the tail end of the Zipf curve.
And PageRank can be just plain dumb at times.
My wife's cat had to be treated for a thyroid condition a couple of years
ago. The treatment is radiation. The cat becomes radioactive. You can't
spend time close to it until the half-life expires, and if you dispose of
the litter improperly, landfill radiation alarms go off.
I posted a blog entry on that on Blogger, one correcting a self-styled
expert in human thyroid disease. For a while, a search for "radioactive cat"
showed the posting high on Google's hit list. Once the article fell into the
archives, the article fell off Google's hit list.
Recently I posted a rant about how bad ABC television's overnight news is.
So if you search for "insomniac ABC" my silly throwaway comment is in the
first couple dozen hits. IT will vanish in a few weeks as the blog rotates.
The flaw here is that Google ranks Blogger postings based on links to the
home page of the blog. Once an article goes into the archive, the home page
URL doesn't point to it, so the article's PageRank tanks, unless there are
numerous links to its permalink.
In many cases the links to new Blogger postings get a PageRank that's too
high, and the links to postings in the archives get a PageRank of zero.
Now Google OWNS Blogger and could easily address this. But the same
phenomenon plays out elsewhere; the home page of The New York Times has a
PageRank that is very high, but the rank of an article that's got huge value
to a researcher may be very low due to a lack of direct links to that
article's specific URL.
/rich
PS --
http://wigblog.blogspot.com/2003/11/our-cat-is-radioactive-self-styled.html
On 7/6/05, Roy Tennant <roy.tennant at ucop.edu> wrote:
>
> Lars' question and Patricia's answer overlooks the fact that Google
> is making a huge assumption about user needs, and creating a system
> that fulfills that assumption but provides no mechanisms for the user
> to change those assumptions. Allow me to be specific.
>
> Sometimes I want to find, for example, brand new web pages -- pages
> that are so new I'm not even sure if Google has crawled them yet. But
> based on the PageRank algorithm as I understand it, these pages would
> naturally fall to the bottom of the search results. Does Google
> provide any method to reverse-sort the results? No. Does Google
> provide a mechanism to view results based on date added to the index?
> No. Does Google provide a mechanism to sort results based on the last
> change date of the page itself? No. So what are we left with? Trying
> to get to the "end" of the search results, wherever that may be.
> Sorry, but that's bad interface design. The fact that you can't,
> apparently, even do it given the systems own mechanisms is flat out
> indefensible. Or, if there numbers are in fact completely wrong and
> there are really only 900 items instead of 15,000 then I guess
> they're just lying to us.
>
> Google does one thing, and it appears to do that one thing well. But
> let's not make the unfortunate assumption that it does more than that
> one, very specific, thing.
> Roy
>
More information about the Web4lib
mailing list