[WEB4LIB] Sample pages at Amazon

Roy Tennant roy.tennant at ucop.edu
Wed Oct 10 16:30:22 EDT 2001


While I applaud thinking "out of the card catalog" (our own 
particular box), and Amazon has been useful in forcing us to do so, 
I'd have to say that their recent jump into added content should have 
been better thought out. For example, the page images I looked at 
(pages from the book's index) were unreadable, and clicking on the 
page image as the standard method by which one can request a larger 
version of an image simply turned the page (not intuitive, at least 
to me).

In 1998 I spec'd out what it would take to use student assistants to 
scan the tables of contents and index pages from non-fiction books as 
they were returned to the UC Berkeley Library (the scheme included a 
method by which to siphon off the most popular items for scanning). 
You can even still see a very simple demonstration that I quickly 
cobbled together to use as an illustration to the proposal (go to 
http://sunsite.berkeley.edu/PEP/ and search on "apartheid"). All the 
students had to do was to look up a book in the catalog, use the 
unique record id as the base for all subsequent filenames of the 
scanned pages, and adhere to a particular naming scheme for those 
files. The files would be OCR'd (uncorrected) and indexed, but only 
the page images would be displayed. All the labor was in the 
scanning, etc., which presumably would be relatively cheap since 
there only a small set of specific, well-defined tasks to be 
performed (and especially cheap at a university with 
slave...uh..student labor).  Everything else (indexing, links in the 
catalog, etc.) was either automated already or could have easily been.

The entire interface is all of 163 lines of Perl code, including 
blank lines for readability, comments, and laoriously spelled out 
algorithms (hey, I'm not a programmer). These 163 lines not only 
builds the display but also determines several things on the fly:

* whether an index has an associated table of contents, or vice versa
* which page is being displayed and therefore which page is before or 
after it, thus allowing the navigation to be completely contextual
* builds a link into the catalog record (currently broken due to 
changes in the catalog system) so the person could find out if it was 
on the shelf

The entire thing, from specs to code to sample scans to index, took a 
few days -- max. So are we talking rocket science? Of course not. Any 
two-bit institution with a living being above the intelligence of a 
chimpanzee, a $100 scanner, and one or more hours a week to throw at 
it could do it. So why did Berkeley not take me up on my proposal? 
Why, several years later, is no one doing much at all about it? To 
this day I simply don't understand it.
Roy

>     Amazon has made some changes to the books section of their Web site that
>allow you to view sample pages of titles.  The home page highlights the new
>"Look Inside" feature that is available for "thousands of books", including
>childrens titles.  For example, if you go to
>http://www.amazon.com/exec/obidos/ASIN/068982954X/103-2992880-3883837 to
>find "Olivia Saves the Circus", you can view the back cover, an excerpt from
>the book, the front & back flaps, and the intro pages (8 sample pages
>total).  Other titles let you view the table of contents, the index, and
>more.  The title "Animal: The Definitive Visual Guide to the World's
>Wildlife" includes 112 sample pages
>(http://www.amazon.com/exec/obidos/ASIN/0789477645/103-2992880-3883837), and
>this item hasn't even been published yet.
>     From a precursory glance, it looks like they are scanning in each page
>and displaying them as standard images in the browser, which essentially
>means they have their own digitizing project.  At the top of each image is
>the phrase "Copyrighted material", which is just another version of the
>signs we put on our photocopy machines.
>     Looks like they scooped libraries again and are offering another service
>that we should be integrating into our catalogs.  How would we pull this
>off?
>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Jenny Levine                                          125 Tower Drive
>Internet Development Specialist              Burr Ridge, IL 60527
>Suburban Library System                       +1 (630) 734 5141
>http://www.sls.lib.il.us/                            levinej at sls.lib.il.us
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



More information about the Web4lib mailing list