[Web4lib] Best Practices for Archival Digitization of multi-page sources
Keith D. Engwall
kengwall at catawba.edu
Fri Oct 23 11:42:35 EDT 2009
We took on a project at our library several years ago to digitize our
campus’ old newspapers. At the time, we saved our scans as multi-page
TIFF files. We intended to use these as the raw originals, and then
convert copies of these files to PDF for use by patrons. With the chaos
of a renovation and other factors, we set things aside, but are now
looking at returning to the project.
I have discovered that we made one particularly challenging mistake in
that some of the newspapers were scanned 2 pages at a time. This makes
it difficult to actually view the newspaper, since the “pages” of the
digital file do not correspond with the pages of the newspaper, and one
must do a lot of cumbersome scrolling in order to navigate the pages
(particularly the right-hand page). I am in the process of putting
together a project to split these double-scans into separate pages, and
I have discovered something that has me questioning a fundamental aspect
of how we are storing these files.
The whole concept of a multi-page TIFF, it turns out, was primarily used
for faxes, and over the years, as fax machines have become obsolete
technology, so has the multi-page TIFF. Relatively few software
products even read multi-page TIFFs (they only recognize the first
page), and fewer still will write to them.
I’m thinking that I should split each of these files so that we have a
separate image per page (stored in a directory for each issue). I know
that I should store these in an uncompressed format. Is TIFF still the
preferred format, or is there a different format that is preferred?
It would be nice to do this as a batch job, but frankly, given the size
of these files, I don’t know whether that is practical on the hardware
available to me. Just opening one of these files takes several minutes.
If anyone has any thoughts on this, I’d appreciate it. Also, if there
is a list that is more focused on this kind of thing, I’d appreciate
knowing that as well.
Thanks,
Keith
Keith Engwall
Systems Librarian
Catawba College Library
kengwall at catawba.edu
More information about the Web4lib
mailing list