[Web4lib] Google Allows Downloads of out-of-copyright Books

Richard Wiggins richard.wiggins at gmail.com
Mon Sep 4 12:09:12 EDT 2006


It is heartening to learn that Michigan has the text and is willing to offer
it.

Somewhere 10 years or so ago I heard a wise person propose that there would
evolve a hierarchy of services for digital libraries.  Maybe, for instance,
you delay OCRing the full text until a person actually calls for the book.
The Google Book project changes all that; the OCRing of full text is part of
the appeal of its search function.  But you could still defer human
correction or more careful OCRing.

Proquest nee UMI used to OCR magazine text using 3 different OCR engines,
having them vote in case of doubt.  So you could do pretty good OCR for the
first pass then intensive machine OCR when someone calls for the book, no
human labor required.  (Happy Labor Day, btw)

It would sure be nice if Google and its partners would just publish the
specs instead of making us divine 'em.

/rich


On 9/2/06, Karen Coyle <kcoyle at kcoyle.net> wrote:
>
> Thank you. And I am SO glad the Michigan shows the underlying text
> (which Google doesn't -- at least not currently). Seeing the text, which
> is the input to the index, will help librarians and power users better
> understand search results and to formulate strategies for searching. OCR
> has some quirks, and seeing them can only help.
>
> Another thought: any chance that Michigan (or any other Google
> libraries) will take on the task of correcting the OCR? (Assuming they
> have the right to do so.)
>
> kc
>
>


More information about the Web4lib mailing list