[Web4lib] Google Allows Downloads of out-of-copyright Books

Karen Coyle kcoyle at kcoyle.net
Thu Aug 31 15:33:22 EDT 2006

Jonathan Gorman wrote:
> I'm skeptical of this.  I've followed OCR on and off again out of my 
> own interest over the last few years.  To be able to handle nearly any 
> book pre-1923 with a reasonable error rate is a bit tricky.  Even if 
> the rate is really good, most projects I have seen require human 
> effort and pre-processing to get these rates.  I'd be glad to be 
> proven wrong here.
OCR companies claim that they can get a 98-99.9% accuracy rate directly 
out of their software. (One of the main companies is Abbyy: 
http://www.abbyy.com) They also claim to be able to OCR 177 languages. 
It's pretty impressive, but remember that 99.9% means that there is one 
bad character, average, for every 1000 characters, which means one 
"typo" per page on average. I don't know how this compares the to 
readers used by the blind (is Kurzweil still the main one?)


Karen Coyle / Digital Library Consultant
kcoyle at kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596
fx.: 510-848-3913
mo.: 510-435-8234

More information about the Web4lib mailing list