[Web4lib] Web4lib: Wikipedia

Thu Mar 18 00:38:44 EDT 2010

Good stuff for people to ponder, Tim.. we'll agree to disagree on
fundamentals.

Happy St. Patrick's Day,
Michael

Michael aka DrWeb
drweb2 at gmail.com
DrWeb2 at Twitter / http://drweb.typepad.com/

On Wed, Mar 17, 2010 at 7:30 PM, Tim Spalding <tim at librarything.com> wrote:

> LARS:
>
> > Exactly that comparison is being made millions of
> > times each day -- by Google's ranking algorithm.
>
> This isn't so. Google isn't deciding between Wikipedia and published
> reference sources. Virtually no published sources are actually on the
> open web, where people could link to them and use them, and it's links
> that drive the Google algorithm.(1)
>
> Pick any list of standard reference sources you like. They're not
> competing with Wikipedia. They're in a separate universe of content.
> They're in the world that wants/needs/does get paid for writing and
> publishing such works. Wikipedia is a great and marvelous resource,
> but there's no getting past the fact that it wins because it doesn't
> compete with published reference sources in search rankings.
>
> MICHAEL:
>
> > Until Wikipedia allows *me* or anyone to see who is writing that
> > item, I will not support Wikipedia.
>
> Do you read the Economist? The OED? Neither list authors. Are you
> *really* searching out the little initials in some reference works, to
> see what august professor of such-and-such lurks behind each article
> in a standard reference source? Do you follow Wikipedia histories to
> their end? (Quite often you CAN know who made an edit.) I think you're
> using anonymity as a stand-in for larger questions of production.
>
> > Wikipedia. It's a flaw in the page ranking algorithm, in that in general,
> > numbers of sheer links will overwhelm any measure of "authority."  WHO is
> > linking to that link matters; the algorithm does not.
>
> It's repeatedly misstated, but the PageRank algorithm does not measure
> the *number* of links. It measures the authority of pages, as defined
> by the authority of pages thank link to it, recursively, with each
> page given a trivial starting PR. So, a link from the New York Times
> is probably a hundred million times more valuable to a site than a
> link from Joe's blog.(2) There is no Turk in the machine, checking
> credentials, but to describe it as a game of sheer numbers isn't to
> describe it accurately.
>
> Tim
>
> 1.  Google sometimes mixes some Google Books data in, but not so much
> for the simple reason that Google Books isn't a place to read books.
> It's a place to search for snippets and a place to buy the book and
> have it shipped to you—activities most people aren't looking for when
> they make a search.
> 2. A standard personal blog will have a PR of maybe 2. PageRank is a
> log score. Assuming it's a log 10—and it's now much higher—then the
> NYT's PageRank is 10 million times that of a personal blog.
>