[Web4lib] Google Allows Downloads of out-of-copyright Books

Tue Sep 5 12:37:17 EDT 2006

On the subject of missing pages, has anyone else noticed that Google 
seems to often skip plates? It seems to be too much of a recurring 
pattern to be coincidence -- are these skipped purposefully?

-Rob-

Rob Weidman <http://dig.lib.lehigh.edu/row3/>
Digital Library Technical Coordinator
Lehigh University
row3 at lehigh.edu
610-758-3043

Karen Coyle wrote:
> I suspect that it's the correcting, rather than finding errors, that 
> is onerous. I, too, was thinking of having somewhere that people could 
> note which books have errors (I just downloaded one that I wanted and 
> found pages missing -- very disappointing). Now I think we should have 
> a place where people can report books that appear to be good scans so 
> that other libraries can concentrate on the books that AREN'T on that 
> list. In the end, though, it's really only economical to do QC as part 
> of the scanning process, when you have the book and the scanning 
> equipment and the operators right there. Like most other activities, 
> clean up after the fact is the least desirable way to go about it.
>
> kc
>
> Patricia F Anderson wrote:
>> Perhaps take a folksonomy approach -- have a system by which patrons 
>> can report or recommend correction of errors they discover. A 
>> wikipedia model, perhaps. Just brainstorming, but it could take the 
>> burden of correction off the local coders.
>>
>>  -- Patricia Anderson, pfa at umich.edu
>>
>> On Mon, 4 Sep 2006, Perry Willett wrote:
>>
>>> We've been concentrating on releasing our access system first, so we 
>>> haven't thought much about it. I don't think there's any issue about 
>>> whether our agreement with Google will allow us--I think it's 
>>> something we are allowed to do. The sheer volume of the task is 
>>> daunting, however.
>>>
>>> Perry Willett
>>> Head, Digital Library Production Service
>>> 300 Hatcher North
>>> University of Michigan
>>> Ann Arbor MI 48109-1205
>>> Ph: 734-764-8074
>>> Fax: 734-647-6897
>>> Email: pwillett at umich.edu
>>>
>>>
>>> On Sat, 2 Sep 2006, Karen Coyle wrote:
>>>
>>>> Thank you. And I am SO glad the Michigan shows the underlying text 
>>>> (which Google doesn't -- at least not currently). Seeing the text, 
>>>> which is the input to the index, will help librarians and power 
>>>> users better understand search results and to formulate strategies 
>>>> for searching. OCR has some quirks, and seeing them can only help.
>>>>
>>>> Another thought: any chance that Michigan (or any other Google 
>>>> libraries) will take on the task of correcting the OCR? (Assuming 
>>>> they have the right to do so.)
>>>>
>>>> kc
>>>>
>>>> Perry Willett wrote:
>>>>> Just to clear this up, we're getting both image and OCR files from 
>>>>> Google for each page. You'll see this specified in our agreement 
>>>>> with Google on p. 4:
>>>>> <http://www.lib.umich.edu/mdp/um-google-cooperative-agreement.pdf>
>>>>>
>>>>> Perry Willett
>>>>> Head, Digital Library Production Service
>>>>> 300 Hatcher North
>>>>> University of Michigan
>>>>> Ann Arbor MI 48109-1205
>>>>> Ph: 734-764-8074
>>>>> Fax: 734-647-6897
>>>>> Email: pwillett at umich.edu
>>>>>
>>>>>> ------------------------------
>>>>>> Date: Thu, 31 Aug 2006 14:07:43 -0700
>>>>>> From: Karen Coyle <kcoyle at kcoyle.net>
>>>>>> Subject: Re: [Web4lib] Google Allows Downloads of out-of-copyright
>>>>>> Books
>>>>>>
>>>>>> Interesting example. If you go to page 1 you get a message saying 
>>>>>> "This
>>>>>> page does not contain any text recoverable by the OCR engine." Is it
>>>>>> possible that Michigan is providing OCR "on the fly?"
>>>>> _______________________________________________
>>>>> Web4lib mailing list
>>>>> Web4lib at webjunction.org
>>>>> http://lists.webjunction.org/web4lib/
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> -----------------------------------
>>>> Karen Coyle / Digital Library Consultant
>>>> kcoyle at kcoyle.net http://www.kcoyle.net
>>>> ph.: 510-540-7596
>>>> fx.: 510-848-3913
>>>> mo.: 510-435-8234
>>>> ------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Web4lib mailing list
>>> Web4lib at webjunction.org
>>> http://lists.webjunction.org/web4lib/
>>>
>>>
>>>
>>
>>
>