[Web4lib] Web4lib: Wikipedia

Wed Mar 17 22:30:20 EDT 2010

LARS:

> Exactly that comparison is being made millions of
> times each day -- by Google's ranking algorithm.

This isn't so. Google isn't deciding between Wikipedia and published
reference sources. Virtually no published sources are actually on the
open web, where people could link to them and use them, and it's links
that drive the Google algorithm.(1)

Pick any list of standard reference sources you like. They're not
competing with Wikipedia. They're in a separate universe of content.
They're in the world that wants/needs/does get paid for writing and
publishing such works. Wikipedia is a great and marvelous resource,
but there's no getting past the fact that it wins because it doesn't
compete with published reference sources in search rankings.

MICHAEL:

> Until Wikipedia allows *me* or anyone to see who is writing that
> item, I will not support Wikipedia.

Do you read the Economist? The OED? Neither list authors. Are you
*really* searching out the little initials in some reference works, to
see what august professor of such-and-such lurks behind each article
in a standard reference source? Do you follow Wikipedia histories to
their end? (Quite often you CAN know who made an edit.) I think you're
using anonymity as a stand-in for larger questions of production.

> Wikipedia. It's a flaw in the page ranking algorithm, in that in general,
> numbers of sheer links will overwhelm any measure of "authority."  WHO is
> linking to that link matters; the algorithm does not.

It's repeatedly misstated, but the PageRank algorithm does not measure
the *number* of links. It measures the authority of pages, as defined
by the authority of pages thank link to it, recursively, with each
page given a trivial starting PR. So, a link from the New York Times
is probably a hundred million times more valuable to a site than a
link from Joe's blog.(2) There is no Turk in the machine, checking
credentials, but to describe it as a game of sheer numbers isn't to
describe it accurately.

Tim

1.  Google sometimes mixes some Google Books data in, but not so much
for the simple reason that Google Books isn't a place to read books.
It's a place to search for snippets and a place to buy the book and
have it shipped to you—activities most people aren't looking for when
they make a search.
2. A standard personal blog will have a PR of maybe 2. PageRank is a
log score. Assuming it's a log 10—and it's now much higher—then the
NYT's PageRank is 10 million times that of a personal blog.