[Web4lib] Google Allows Downloads of out-of-copyright Books
Karen Coyle
kcoyle at kcoyle.net
Thu Aug 31 15:33:22 EDT 2006
Jonathan Gorman wrote:
>
> I'm skeptical of this. I've followed OCR on and off again out of my
> own interest over the last few years. To be able to handle nearly any
> book pre-1923 with a reasonable error rate is a bit tricky. Even if
> the rate is really good, most projects I have seen require human
> effort and pre-processing to get these rates. I'd be glad to be
> proven wrong here.
>
>
OCR companies claim that they can get a 98-99.9% accuracy rate directly
out of their software. (One of the main companies is Abbyy:
http://www.abbyy.com) They also claim to be able to OCR 177 languages.
It's pretty impressive, but remember that 99.9% means that there is one
bad character, average, for every 1000 characters, which means one
"typo" per page on average. I don't know how this compares the to
readers used by the blind (is Kurzweil still the main one?)
kc
--
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle at kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------
More information about the Web4lib
mailing list