[Web4lib] Google Allows Downloads of out-of-copyright Books

Tue Sep 5 11:54:48 EDT 2006

Projekt Runeberg in Sweden has been taking this approach for years now,
with an impressively simple setup. A sample page:
http://runeberg.org/svedelif/0126.html . Page image at the top, raw OCR
text at the bottom, click a link to edit the text in a simple in-browser
editor. If the page has been proof-read before, there's a wiki-style
history page (e.g.
http://runeberg.org/rc.pl?action=history&src=svedelif/0003 ). I'd love
to try something like this with our projects; talk about engaging users
and building community! The problem would be to maintain the more
complex underlying text structure, which assigns a page position to each
word (so that the search term can be highlighted in the retrieved page
image, as Google does). You can't just put up text and let someone edit
it, you have to have a system that keeps track of each word. It
certainly could be done but we haven't tried to build it yet. (And then
of course all the usual issues around monitoring and vetting
user-submitted content.)

Peter

Peter Binkley
Digital Initiatives Technology Librarian
Information Technology Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: peter.binkley at ualberta.ca

-----Original Message-----
From: web4lib-bounces at webjunction.org
[mailto:web4lib-bounces at webjunction.org] On Behalf Of Patricia F
Anderson
Sent: Monday, September 04, 2006 1:10 PM
To: Perry Willett
Cc: Karen Coyle; web4lib at webjunction.org
Subject: Re: [Web4lib] Google Allows Downloads of out-of-copyright Books

Perhaps take a folksonomy approach -- have a system by which patrons can
report or recommend correction of errors they discover. A wikipedia
model, perhaps. Just brainstorming, but it could take the burden of
correction off the local coders.

  -- Patricia Anderson, pfa at umich.edu

On Mon, 4 Sep 2006, Perry Willett wrote:

> We've been concentrating on releasing our access system first, so we 
> haven't thought much about it. I don't think there's any issue about 
> whether our agreement with Google will allow us--I think it's 
> something we are allowed to do. The sheer volume of the task is
daunting, however.
>
> Perry Willett
> Head, Digital Library Production Service 300 Hatcher North University 
> of Michigan Ann Arbor MI 48109-1205
> Ph: 734-764-8074
> Fax: 734-647-6897
> Email: pwillett at umich.edu
>
>
> On Sat, 2 Sep 2006, Karen Coyle wrote:
>
>> Thank you. And I am SO glad the Michigan shows the underlying text 
>> (which Google doesn't -- at least not currently). Seeing the text, 
>> which is the input to the index, will help librarians and power users

>> better understand search results and to formulate strategies for 
>> searching. OCR has some quirks, and seeing them can only help.
>> 
>> Another thought: any chance that Michigan (or any other Google 
>> libraries) will take on the task of correcting the OCR? (Assuming 
>> they have the right to do so.)
>> 
>> kc
>> 
>> Perry Willett wrote:
>>> Just to clear this up, we're getting both image and OCR files from 
>>> Google for each page. You'll see this specified in our agreement 
>>> with Google on p. 4:
>>> <http://www.lib.umich.edu/mdp/um-google-cooperative-agreement.pdf>
>>> 
>>> Perry Willett
>>> Head, Digital Library Production Service 300 Hatcher North 
>>> University of Michigan Ann Arbor MI 48109-1205
>>> Ph: 734-764-8074
>>> Fax: 734-647-6897
>>> Email: pwillett at umich.edu
>>> 
>>>> ------------------------------
>>>> Date: Thu, 31 Aug 2006 14:07:43 -0700
>>>> From: Karen Coyle <kcoyle at kcoyle.net>
>>>> Subject: Re: [Web4lib] Google Allows Downloads of out-of-copyright 
>>>> Books
>>>> 
>>>> Interesting example. If you go to page 1 you get a message saying 
>>>> "This page does not contain any text recoverable by the OCR 
>>>> engine." Is it possible that Michigan is providing OCR "on the
fly?"
>>> _______________________________________________
>>> Web4lib mailing list
>>> Web4lib at webjunction.org
>>> http://lists.webjunction.org/web4lib/
>>> 
>>> 
>> 
>> --
>> -----------------------------------
>> Karen Coyle / Digital Library Consultant kcoyle at kcoyle.net 
>> http://www.kcoyle.net
>> ph.: 510-540-7596
>> fx.: 510-848-3913
>> mo.: 510-435-8234
>> ------------------------------------
>> 
>> 
>> 
>> 
>> 
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
>
>
>
_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/