[Web4lib] The sources of Wikipedia

Lars Aronsson lars at aronsson.se
Thu Sep 7 15:24:40 EDT 2006


For those of us helplessly addicted to Perl programming, one of 
the greatest joys of Wikipedia is the ability to download the 
entire database dump in XML format and dig through it for hidden 
patterns.  These are available at http://download.wikimedia.org/

One of the original peculiarities of the wiki markup language used 
in Wikipedia's articles is that the letters ISBN followed by one 
whitespace and ten digits (or an X) is recognized as a link to a 
separate page, from where you can look up that ISBN number in 
various bookstores or libraries.  In the most recent dump of the 
English Wikipedia, I found 161,973 such ISBN patterns.  All books 
are created equal, but some are more equal than the rest.  I found 
the following ISBNs to be the most referenced:

Count ISBN        Title

  460 0954381157  "Trade unions of the world"
  391 0439154049  "The official Pokemon handbook"
  389 193020650X  "Official Nintendo Pokémon FireRed Version"
  387 130206151   (an error for 1930206151, another Pokemon title)
  372 1930206585  "Official Nintendo Pokémon Emerald Player's Guide"
  357 0761547614  "Prima's Official Pokemon Guide"
  346 0002169878  "Collins Guide to the Sea Fishes of New Zealand"
  342 1569315604  "Pokemon Adventures, Adventure 3: Saffron Cit..."
  334 1930206194  "Super Smash Bros. Melee, Official Guide from..."
  334 1569315086  "Pokemon Adventures: Legendary Pokemon, Vol. 2"
  333 1569314365  "Pokemon Graphic Novel vol. 3: Electric Pikac..."
  332 1930206313  "Gameboy Advance Pokemon Ruby Version and Sap..."
  332 1598120026  "Official Nintendo Pokémon XD: Gale of Darkne..."
  332 1569318514  "Pokemon Adventures, Volume 7: Yellow Caballe..."

Well, I could go on, but I'll stop there.  I guess all it takes is 
a handful of people with a strong interest in Pokemon who are very 
careful to cite sources with ISBN numbers, and pretty soon you 
outnumber everybody except the guy who wrote 460 articles about 
trade unions, always citing the same book.


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se


More information about the Web4lib mailing list