[Web4lib] Google Allows Downloads of out-of-copyright Books

Thu Aug 31 15:29:30 EDT 2006

On Thu, 31 Aug 2006, Jonathan Gorman wrote:
> To make my point clearer, these books should be able to be addressed by 
> the same measures libraries are already taking to serve patrons in the 
> real world.  We shouldn't be waiting for Google to do it for us.  To turn 
> down or not point out resources to patrons online because they don't meet 
> criteria that we would not hold for our own services seems suspect.  The 
> question that I was responding to was the suggestion that perhaps we 
> should not recommend Google books to people due to accessibility issues.

Ah, gotcha.  Makes sense.

> > Technologically it is
> > not that difficult.
> 
> I'm skeptical of this.  I've followed OCR on and off again out of my own 
> interest over the last few years.  To be able to handle nearly any book 
> pre-1923 with a reasonable error rate is a bit tricky.  Even if the 
> rate is really good, most projects I have seen require human 
> effort and pre-processing to get these rates.  I'd be glad to be proven 
> wrong here.

Well, they have the OCR in place.  It's not perfect, but it's
good enough to do search and to highlight the found word in the
PDF of the scan. (Try it.)  So all they need to do is expose that
plaintext, either directly to screenreaders, or, more flexibly,
as a downloadable file.  It won't be perfect, but it will be as
good as the current search functionality.

And maybe JAWS can already see the plaintext.  Dragon can't,
though.

-Deborah
-- 
Deborah Kaplan
Digital Initiatives Librarian
Brandeis University