Scanning text

Thu Mar 14 13:36:22 EST 2002

>What's new for us is text scanning. We  would like to scan the contents of
>several (public domain) books in non-OCR mode - we're simply doing the
>images of the pages. We're not sure how we make the page scans of the book
>contents available to users as a single unit, if that makes sense, i.e.
>they would click on a link to "Algoma Mines" and then voila, the scanned
>book magically appears in its entirety. ;) I know it's do-able, since books
>like this are visible all over the Web. But *how* did they do it?

If I might make a suggestion...

>From my experience in putting books online - mostly local histories and
genealogies, but also the city master plan - users do not like to read books as
page images.

They will do it if it's the only way, and having images along with the text is
also nice, but if you actually plan for them to *read through* the material on
a computer, having to do it one graphic at a time just makes them cranky.

OCRing has other obvious advantages in loading speed, being indexed by search
engines and accessibility to users with screen readers and slow connections,
but I'm just referring to general usability here.

I'm sure some people have successes to the contrary with PDF files etc., and
perhaps my users are just particularly finicky.  But I doubt it. :-)

Bob Sullivan                               scp_sulli at sals.edu
Schenectady County Public Library (NY)     http://www.scpl.org