Fwd: DjVu questions

Grace Agnew gagnew at rci.rutgers.edu
Mon Apr 15 14:44:59 EDT 2002


>Delivered-To: gagnew at rci.rutgers.edu
>Date: Mon, 15 Apr 2002 13:30:24 -0400
>From: "Jeffery A. Triggs" <triggs at rci.rutgers.edu>
>X-Mailer: Mozilla 4.76 [en] (WinNT; U)
>X-Accept-Language: en
>To: bob at esrl.lib.md.us
>CC: Grace Agnew <gagnew at rci.rutgers.edu>
>Subject: DjVu questions
>
>Hi,
>
>Grace Agnew sent me your letter and asked if I would reply. Before
>coming here, I worked with the DjVu development group at AT&T Labs, and
>so have some background with the software that may be of use. I've also
>developed several large-scale DjVu sites, such as the Century Dictionary
>Online and DjVu Zone.
> >
> > > >>Our organization has a large number of text documents they would
>like to
> > > >>scan and have available on the Intranet. But they would need to be
>
> > keyword
> > > >>searchable. Can DjVu handle this? Can a PHP front-end be built to
>support
> > > >>the searching through DjVu?
> >
>The DjVu specification allows for several annotation/metadata chunks,
>including a compressed OCR text chunk. These text chunks, which can be
>extracted with coordinates, corrected, and reinserted, allow a number of
>different keyword search possibilities. The internal text chunks can be
>searched using the DjVu viewer find function. The texts can also be
>extracted into ASCII metafiles, which can be indexed by external search
>engines. The Century site, for instance, makes use of both of these
>approaches. There is a fast search engine that accesses an index of the
>full text of all 12 volumes of the dictionary and returns the names of
>DjVu pages where a term appears. The user can also search each page (or
>all the pages of each multipage document, typically one volume)
>internally. Recently, we have been experimenting with considerable
>success with highlighting search terms on the fly.
>
>You can use a variety of front ends to DjVu interfaces. At AT&T and the
>Century, I used PHP and Perl successfully, both separately and in
>combination depending on the needs of a particular interface. Here at
>Rutgers we've been using Cold Fusion and PHP with DjVu interfaces.. The
>DjVu file format is very flexible and is addressable with varying
>granularity down to the individual page level.
>
>If I can help with any other questions, feel free to contact me.
>
>Jeffery

Grace Agnew
Associate University Librarian for Digital Library Systems
Rutgers, the State University of New Jersey
Library Technical Services Building
47 Davidson Road
Piscataway, NJ  08854-5603

gagnew at rci.rutgers.edu
PH: (732) 445-5908
FAX: (732) 445-5888




More information about the Web4lib mailing list