[Web4lib] Google Allows Downloads of out-of-copyright Books

Karen Coyle kcoyle at kcoyle.net
Mon Sep 4 17:45:33 EDT 2006


I suspect that it's the correcting, rather than finding errors, that is 
onerous. I, too, was thinking of having somewhere that people could note 
which books have errors (I just downloaded one that I wanted and found 
pages missing -- very disappointing). Now I think we should have a place 
where people can report books that appear to be good scans so that other 
libraries can concentrate on the books that AREN'T on that list. In the 
end, though, it's really only economical to do QC as part of the 
scanning process, when you have the book and the scanning equipment and 
the operators right there. Like most other activities, clean up after 
the fact is the least desirable way to go about it.

kc

Patricia F Anderson wrote:
> Perhaps take a folksonomy approach -- have a system by which patrons 
> can report or recommend correction of errors they discover. A 
> wikipedia model, perhaps. Just brainstorming, but it could take the 
> burden of correction off the local coders.
>
>  -- Patricia Anderson, pfa at umich.edu
>
> On Mon, 4 Sep 2006, Perry Willett wrote:
>
>> We've been concentrating on releasing our access system first, so we 
>> haven't thought much about it. I don't think there's any issue about 
>> whether our agreement with Google will allow us--I think it's 
>> something we are allowed to do. The sheer volume of the task is 
>> daunting, however.
>>
>> Perry Willett
>> Head, Digital Library Production Service
>> 300 Hatcher North
>> University of Michigan
>> Ann Arbor MI 48109-1205
>> Ph: 734-764-8074
>> Fax: 734-647-6897
>> Email: pwillett at umich.edu
>>
>>
>> On Sat, 2 Sep 2006, Karen Coyle wrote:
>>
>>> Thank you. And I am SO glad the Michigan shows the underlying text 
>>> (which Google doesn't -- at least not currently). Seeing the text, 
>>> which is the input to the index, will help librarians and power 
>>> users better understand search results and to formulate strategies 
>>> for searching. OCR has some quirks, and seeing them can only help.
>>>
>>> Another thought: any chance that Michigan (or any other Google 
>>> libraries) will take on the task of correcting the OCR? (Assuming 
>>> they have the right to do so.)
>>>
>>> kc
>>>
>>> Perry Willett wrote:
>>>> Just to clear this up, we're getting both image and OCR files from 
>>>> Google for each page. You'll see this specified in our agreement 
>>>> with Google on p. 4:
>>>> <http://www.lib.umich.edu/mdp/um-google-cooperative-agreement.pdf>
>>>>
>>>> Perry Willett
>>>> Head, Digital Library Production Service
>>>> 300 Hatcher North
>>>> University of Michigan
>>>> Ann Arbor MI 48109-1205
>>>> Ph: 734-764-8074
>>>> Fax: 734-647-6897
>>>> Email: pwillett at umich.edu
>>>>
>>>>> ------------------------------
>>>>> Date: Thu, 31 Aug 2006 14:07:43 -0700
>>>>> From: Karen Coyle <kcoyle at kcoyle.net>
>>>>> Subject: Re: [Web4lib] Google Allows Downloads of out-of-copyright
>>>>> Books
>>>>>
>>>>> Interesting example. If you go to page 1 you get a message saying 
>>>>> "This
>>>>> page does not contain any text recoverable by the OCR engine." Is it
>>>>> possible that Michigan is providing OCR "on the fly?"
>>>> _______________________________________________
>>>> Web4lib mailing list
>>>> Web4lib at webjunction.org
>>>> http://lists.webjunction.org/web4lib/
>>>>
>>>>
>>>
>>> -- 
>>> -----------------------------------
>>> Karen Coyle / Digital Library Consultant
>>> kcoyle at kcoyle.net http://www.kcoyle.net
>>> ph.: 510-540-7596
>>> fx.: 510-848-3913
>>> mo.: 510-435-8234
>>> ------------------------------------
>>>
>>>
>>>
>>>
>>>
>> _______________________________________________
>> Web4lib mailing list
>> Web4lib at webjunction.org
>> http://lists.webjunction.org/web4lib/
>>
>>
>>
>
>

-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle at kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------




More information about the Web4lib mailing list