[Web4lib] The sources of Wikipedia

Lars Aronsson lars at aronsson.se
Sat Sep 9 05:24:45 EDT 2006


Karen Coyle wrote:

> Lars, This is a wonderful example of the difference between 
> scholarly environments and popular environments. With popular 
> environments, it's, well it's popularity that matters.

I guess we're all "popular" (people-oriented) unless we live as 
hermits in the desert.  The question is which people we're among. 

Wikipedia is written by its users, and a big question right now is 
who these users are.  See for example the recent article by Aaron 
Swartz [1].  While I don't agree with his rhetoric, the question 
is worth pondering.

  [1] http://www.aaronsw.com/weblog/whowriteswikipedia

Each language branch of Wikipedia is not only a separate edition, 
but also a different community of editors.  This also changes over 
time. If you speak Latvian, joining the Latvian Wikipedia now when 
it has 5000 articles is similar to what it was like to join the 
English Wikipedia in mid 2001 when it had 5000 articles.  As I 
reported, the Pokemon manuals are the top references for the 
English Wikipedia.  In the much smaller Swedish Wikipedia, the 
most cited works are two books on the construction of church 
organs. That's hardly a fashion trend, but the result of one or 
maybe two very active users.  All it shows is that citing 
literature in Wikipedia is still in its infancy, where individuals 
can have a big impact.  Rather than the top 10, we should be 
looking for patterns in the long tail, where the many people are. 
If I could easily map ISBN to (Dewey?) classification, then I 
could compile statistics on distribution of literature references 
over topic classes.

The books that get cited in hundreds of articles are likely to be 
biographic dictionaries (of church organ builders), overviews of 
all fish species in New Zealand, or Pokemon user guides.  Perhaps 
those shouldn't be used as references at all.  For a biographic 
article in Wikipedia, should you cite a printed biographic 
dictionary or some other, more solid research about the person? 
Perhaps the top 10 list is a Todo list, indicating which articles 
need to be improved, because they were based on facts from a work 
that also provided information for many other articles.

By the way, the extraction of ISBNs from Wikipedia is now handled 
by a more general script that I published here,
http://meta.wikimedia.org/wiki/User:LA2/Extraktor

This script will also tell you which articles referenced which 
ISBN number.  This can be useful for a reverse lookup. You could 
make your library catalog indicate: "The following Wikipedia 
articles make reference to this ISBN".

For example, for ISBN 0826317243, "Wisdom sits in places : 
landscape and language among the Western Apache" (1996) you can 
report that it is referenced from these Wikipedia articles, two in 
English and two in German:

http://en.wikipedia.org/wiki/Southern_Athabaskan_languages/Bibliography
http://en.wikipedia.org/wiki/Western_Apache/Bibliography
http://de.wikipedia.org/wiki/Westliche_Apachen
http://de.wikipedia.org/wiki/Apache_%28Sprache%29/Literatur


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se


More information about the Web4lib mailing list