[WEB4LIB] Re: More on Google digitization and Europe

Karen Coyle kcoyle at kcoyle.net
Thu Apr 28 11:29:47 EDT 2005


I feel a need to point out some differences between some of these 
digitization projects, because the differences are significant.

Google is scanning texts, running them through OCR, and creating indexes 
based on *un-corrected* OCR. So they aren't interested in reproducing 
the text itself as text, and are using a "good enough" approach to 
access. It's kind of a quantity vs. quality approach. It reminds me of a 
remark made by a Stanford CS student after I gave a lecture to his class 
showing them what they miss when they do keyword searching on the net. 
His comment was: "But I always get something." Google will be delivering 
something. We don't even know if their quality control is such that they 
will notice if their scanning process misses some pages. I would bet 
that they won't do inserts or tipped-in pages, especially if they aren't 
standard size. This is not a criticism of the Google project; they've 
chosen this method as an economic way to do something that would be 
unaffordable otherwise. It's a simple trade-off. But the resulting files 
may not be a viable digital substitute for the book, and I would be 
hesitant to consider them as preservation quality copies.

Project Gutenberg focuses on the text qua text, with its claim that 
plain text is the most "universal" format. As such, it loses the 
"artifact" quality of the book, such as fonts, page numbers, layout, and 
presumably any illustrations as well. Again, they made a trade-off, 
since their main goal is delivering texts as inexpensively as possible 
over the Net.

Some projects, like the University of Virginia e-text project, work very 
hard to retain the "feel" of the original item, making a careful 
selection of items to work with, and reproducing images. I assume that 
they have done strict quality control. They do not have a large quantity 
of items. The Runeberg project appears to have a similar philosophy. 
These digital could be used for careful study of the text as well as 
giving a glimpse into the nature of the original hard copy item.

Each of these projects converts some hardcopy materials to a digital 
format. But that's where the similarity ends.

kc

Lars Aronsson wrote:

>Bernie Sloan quoted Deutsche Welle:
>
>  
>
>>"Nineteen European national libraries have joined forces against 
>>a planned communications revolution by Internet search giant 
>>Google to create a global virtual library, organizers said
>>    
>>
>
>If the plans for Google Print, as described in the press release, 
>are indeed perceived as a "communications revolution", then I must 
>congratulate the marketing and publicity people at Google.  Maybe 
>we were impressed by the engineers at Google, but the marketing 
>side of that firm surely is not lagging behind.
>
>What the press release (in December) said is that ten years from 
>now, Google Print is going to have 15 million volumes digitized.  
>I believe so too, but predicting anything ten years into the 
>future is science fiction.  Just think where we were ten years 
>ago, and try even to predict the dotcom crash.  Google, founded in 
>1998, hasn't been around for ten years yet.
>
>Project Gutenberg (gutenberg.org) was indeed around ten years ago, 
>and has a track record of doubling their collection every year.  
>They now have 15,000 books online, so ten years from now they 
>could have 15 million books, since 2 to the power of 10 is 1024.  
>Google Print needs only to copy and index them.  Any problems with 
>Project Gutenberg's textual quality so far, need only be a problem 
>in the oldest 15,000 e-books out of 15 million.
>
>My own Project Runeberg (runeberg.org) was also around ten years 
>ago.  Its growth has been less continuous than that of PG, but we 
>currently have some 800 volumes of classic Scandinavian literature 
>online and are now doubling annually.  Doing this entirely on a 
>volunteer basis, we are leaving the current digitization efforts 
>of the national libraries in Denmark, Finland and Sweden (as 
>mentioned among the 19 in the Deutsche Welle article) far behind.  
>This could change in the course of ten years, so we limit our 
>predictions to a few months.  Still, why would we feel a "threat" 
>from Google or anybody else?  Google is our best friend. That's 
>how people find our books.
>
>The only reason I can see for portraying Google (or anything 
>American) as a threat is that it appears to be a working solution 
>for attracting funding from the European Union.  Our national 
>libraries are far better at this than they are on digitization.
>
>I couldn't find any news or announcements on this new European 
>deal on www.kb.dk, www.lib.helsinki.fi or www.kb.se.  Neither at 
>www.ddb.de. Bibliotheque nationale de France has a collection of 
>articles at http://www.bnf.fr/pages/dernmin/com_google.htm but 
>nothing is mentioned about the 19 country coallition.
>
>
>  
>

-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle at kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------




More information about the Web4lib mailing list