[WEB4LIB] DjVu scan-to-web
Thomas Bennett
bennetttm at appstate.edu
Mon Apr 15 13:06:58 EDT 2002
Text captured from
http://www.lizardtech.com/samples/view1.pl?image=/products/whitepapers/djvu_
overview/directory.djvu&thumbnail=left
If the link above does not work for you, you may need to download and
install the DjVu pluggin or you may not have the full url in your browser
location bar since it is wrapped to another line here.
"
The foreground layer can be OCRed and the result embedded back into the DjVu
file as a searchable
"hidden text" layer. Tools are available to extract that text and translate
it into an XML format
that includes each word, together with its bounding box coordinates on the
page, and the document
structure (pages, columns, paragraphs, lines, words). Hyperlinks,
annotations, page thumbnails,
and other metadata can also be embedded into DjVu documents.
Server-side full-text search can easily be provided using free indexing
tools and a few Perl scripts.
Large collections have been or are being put on the Web in DjVu with
full-text search capabilities,
including the NIPS Proceedings (13 volumes, 14,000 pages at 400dpi, 191MB),
the Century Dic-
tionnary (8 volumes), along with several national library collections and
content from commercial
providers around the world. DjVu is currently used by thousands of users to
publish and exchange
scanned documents on the Web.
"
The search option in the pluggin allows searching the current page or the
entire document.
Thomas
-----Original Message-----
From: web4lib at webjunction.org
[mailto:web4lib at webjunction.org]On Behalf Of bob at esrl.lib.md.us
Sent: Monday, April 15, 2002 12:01 PM
To: Multiple recipients of list
Subject: [WEB4LIB] DjVu scan-to-web
Does anyone have experience with LizardTech's DjVu software?
Our organization has a large number of text documents they would like to
scan and have available on the Intranet. But they would need to be keyword
searchable. Can DjVu handle this? Can a PHP front-end be built to support
the searching through DjVu?
--
Bob Long, Eastern Shore Regional Library
Senior Systems Technician
410 479 0776 (v)
410 548 5807 (f)
More information about the Web4lib
mailing list