[WEB4LIB] Sample pages at Amazon
Roy Tennant
roy.tennant at ucop.edu
Wed Oct 10 16:30:22 EDT 2001
While I applaud thinking "out of the card catalog" (our own
particular box), and Amazon has been useful in forcing us to do so,
I'd have to say that their recent jump into added content should have
been better thought out. For example, the page images I looked at
(pages from the book's index) were unreadable, and clicking on the
page image as the standard method by which one can request a larger
version of an image simply turned the page (not intuitive, at least
to me).
In 1998 I spec'd out what it would take to use student assistants to
scan the tables of contents and index pages from non-fiction books as
they were returned to the UC Berkeley Library (the scheme included a
method by which to siphon off the most popular items for scanning).
You can even still see a very simple demonstration that I quickly
cobbled together to use as an illustration to the proposal (go to
http://sunsite.berkeley.edu/PEP/ and search on "apartheid"). All the
students had to do was to look up a book in the catalog, use the
unique record id as the base for all subsequent filenames of the
scanned pages, and adhere to a particular naming scheme for those
files. The files would be OCR'd (uncorrected) and indexed, but only
the page images would be displayed. All the labor was in the
scanning, etc., which presumably would be relatively cheap since
there only a small set of specific, well-defined tasks to be
performed (and especially cheap at a university with
slave...uh..student labor). Everything else (indexing, links in the
catalog, etc.) was either automated already or could have easily been.
The entire interface is all of 163 lines of Perl code, including
blank lines for readability, comments, and laoriously spelled out
algorithms (hey, I'm not a programmer). These 163 lines not only
builds the display but also determines several things on the fly:
* whether an index has an associated table of contents, or vice versa
* which page is being displayed and therefore which page is before or
after it, thus allowing the navigation to be completely contextual
* builds a link into the catalog record (currently broken due to
changes in the catalog system) so the person could find out if it was
on the shelf
The entire thing, from specs to code to sample scans to index, took a
few days -- max. So are we talking rocket science? Of course not. Any
two-bit institution with a living being above the intelligence of a
chimpanzee, a $100 scanner, and one or more hours a week to throw at
it could do it. So why did Berkeley not take me up on my proposal?
Why, several years later, is no one doing much at all about it? To
this day I simply don't understand it.
Roy
> Amazon has made some changes to the books section of their Web site that
>allow you to view sample pages of titles. The home page highlights the new
>"Look Inside" feature that is available for "thousands of books", including
>childrens titles. For example, if you go to
>http://www.amazon.com/exec/obidos/ASIN/068982954X/103-2992880-3883837 to
>find "Olivia Saves the Circus", you can view the back cover, an excerpt from
>the book, the front & back flaps, and the intro pages (8 sample pages
>total). Other titles let you view the table of contents, the index, and
>more. The title "Animal: The Definitive Visual Guide to the World's
>Wildlife" includes 112 sample pages
>(http://www.amazon.com/exec/obidos/ASIN/0789477645/103-2992880-3883837), and
>this item hasn't even been published yet.
> From a precursory glance, it looks like they are scanning in each page
>and displaying them as standard images in the browser, which essentially
>means they have their own digitizing project. At the top of each image is
>the phrase "Copyrighted material", which is just another version of the
>signs we put on our photocopy machines.
> Looks like they scooped libraries again and are offering another service
>that we should be integrating into our catalogs. How would we pull this
>off?
>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Jenny Levine 125 Tower Drive
>Internet Development Specialist Burr Ridge, IL 60527
>Suburban Library System +1 (630) 734 5141
>http://www.sls.lib.il.us/ levinej at sls.lib.il.us
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the Web4lib
mailing list