[Web4lib] Google Allows Downloads of out-of-copyright Books

Robert Weidman row3 at Lehigh.EDU
Tue Sep 5 14:51:24 EDT 2006


Sorry, I should have included some examples of the missing plates:

William Winter: Old Friends
http://books.google.com/books?id=XTk2SsZ9JCsC&pg=PA3
Only has the Frontispiece and one of the plates from the text (missing 
the 19 other plates)

Rose Eytinge: The memories of Rose Eytinge
http://books.google.com/books?id=8TQLAAAAIAAJ&pg=PR3
Only has the Frontispiece (missing the 7 other plates)

I have seen many others like this...

-rob-

>  
>
> Robert Weidman wrote:
>> On the subject of missing pages, has anyone else noticed that Google 
>> seems to often skip plates? It seems to be too much of a recurring 
>> pattern to be coincidence -- are these skipped purposefully?
>>
>> -Rob-
>>
>> Rob Weidman <http://dig.lib.lehigh.edu/row3/>
>> Digital Library Technical Coordinator
>> Lehigh University
>> row3 at lehigh.edu
>> 610-758-3043
>>
>>
>> Karen Coyle wrote:
>>> I suspect that it's the correcting, rather than finding errors, that 
>>> is onerous. I, too, was thinking of having somewhere that people 
>>> could note which books have errors (I just downloaded one that I 
>>> wanted and found pages missing -- very disappointing). Now I think 
>>> we should have a place where people can report books that appear to 
>>> be good scans so that other libraries can concentrate on the books 
>>> that AREN'T on that list. In the end, though, it's really only 
>>> economical to do QC as part of the scanning process, when you have 
>>> the book and the scanning equipment and the operators right there. 
>>> Like most other activities, clean up after the fact is the least 
>>> desirable way to go about it.
>>>
>>> kc
>>>
>>> Patricia F Anderson wrote:
>>>> Perhaps take a folksonomy approach -- have a system by which 
>>>> patrons can report or recommend correction of errors they discover. 
>>>> A wikipedia model, perhaps. Just brainstorming, but it could take 
>>>> the burden of correction off the local coders.
>>>>
>>>>  -- Patricia Anderson, pfa at umich.edu
>>>>
>>>> On Mon, 4 Sep 2006, Perry Willett wrote:
>>>>
>>>>> We've been concentrating on releasing our access system first, so 
>>>>> we haven't thought much about it. I don't think there's any issue 
>>>>> about whether our agreement with Google will allow us--I think 
>>>>> it's something we are allowed to do. The sheer volume of the task 
>>>>> is daunting, however.
>>>>>
>>>>> Perry Willett
>>>>> Head, Digital Library Production Service
>>>>> 300 Hatcher North
>>>>> University of Michigan
>>>>> Ann Arbor MI 48109-1205
>>>>> Ph: 734-764-8074
>>>>> Fax: 734-647-6897
>>>>> Email: pwillett at umich.edu
>>>>>
>>>>>
>>>>> On Sat, 2 Sep 2006, Karen Coyle wrote:
>>>>>
>>>>>> Thank you. And I am SO glad the Michigan shows the underlying 
>>>>>> text (which Google doesn't -- at least not currently). Seeing the 
>>>>>> text, which is the input to the index, will help librarians and 
>>>>>> power users better understand search results and to formulate 
>>>>>> strategies for searching. OCR has some quirks, and seeing them 
>>>>>> can only help.
>>>>>>
>>>>>> Another thought: any chance that Michigan (or any other Google 
>>>>>> libraries) will take on the task of correcting the OCR? (Assuming 
>>>>>> they have the right to do so.)
>>>>>>
>>>>>> kc
>>>>>>
>>>>>> Perry Willett wrote:
>>>>>>> Just to clear this up, we're getting both image and OCR files 
>>>>>>> from Google for each page. You'll see this specified in our 
>>>>>>> agreement with Google on p. 4:
>>>>>>> <http://www.lib.umich.edu/mdp/um-google-cooperative-agreement.pdf>
>>>>>>>
>>>>>>> Perry Willett
>>>>>>> Head, Digital Library Production Service
>>>>>>> 300 Hatcher North
>>>>>>> University of Michigan
>>>>>>> Ann Arbor MI 48109-1205
>>>>>>> Ph: 734-764-8074
>>>>>>> Fax: 734-647-6897
>>>>>>> Email: pwillett at umich.edu
>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> Date: Thu, 31 Aug 2006 14:07:43 -0700
>>>>>>>> From: Karen Coyle <kcoyle at kcoyle.net>
>>>>>>>> Subject: Re: [Web4lib] Google Allows Downloads of out-of-copyright
>>>>>>>> Books
>>>>>>>>
>>>>>>>> Interesting example. If you go to page 1 you get a message 
>>>>>>>> saying "This
>>>>>>>> page does not contain any text recoverable by the OCR engine." 
>>>>>>>> Is it
>>>>>>>> possible that Michigan is providing OCR "on the fly?"
>>>>>>> _______________________________________________
>>>>>>> Web4lib mailing list
>>>>>>> Web4lib at webjunction.org
>>>>>>> http://lists.webjunction.org/web4lib/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> -----------------------------------
>>>>>> Karen Coyle / Digital Library Consultant
>>>>>> kcoyle at kcoyle.net http://www.kcoyle.net
>>>>>> ph.: 510-540-7596
>>>>>> fx.: 510-848-3913
>>>>>> mo.: 510-435-8234
>>>>>> ------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Web4lib mailing list
>>>>> Web4lib at webjunction.org
>>>>> http://lists.webjunction.org/web4lib/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>> Web4lib mailing list
>> Web4lib at webjunction.org
>> http://lists.webjunction.org/web4lib/
>>
>>
>


More information about the Web4lib mailing list