[Web4lib] German Wikipedia to be published in book form?

Lars Aronsson lars at aronsson.se
Wed Apr 23 00:42:39 EDT 2008


B.G. Sloan wrote:

> Editors will distil 50,000 of the most popular entries in the 
> German version of Wikipedia into the 1,000-page volume to go on 
> sale in September.

The German Wikipedia contains more than 700,000 articles, many of 
which are quite long.  What they're doing here is picking the 
50,000 most visited articles, based on available visitor 
statistics and extracting the first paragraph or sentence from 
each article.  Even if 1000 pages is a quite thick volume, every 
page needs to fit 50 articles, so they can't be very long.  You 
get to know that Titanic was a ship that sank, but not much more.

This sounds like something you could do with a Perl script in an 
afternoon.  In that aspect, it's a neat hack. The hard part is to 
weed out the articles that happened to contain vandalism at that 
point in time, and to list all the authors in a way that satisfies 
the GNU Free Documentation License (GNU FDL).

The exciting part is that Bertelsmann (the German media giant that 
owns Random House), being a major publisher of encyclopedias 
already, puts its name behind this.  As a surprise move, it is a 
parallel to BMG's deal with Napster some years ago.  BMG (now part 
of Sony BMG) is the Bertelsmann Music Group.


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se




More information about the Web4lib mailing list