[Web4lib] Best Practices for Archival Digitization of multi-page sources

Keith D. Engwall kengwall at catawba.edu
Fri Oct 23 11:42:35 EDT 2009


We took on a project at our library several years ago to digitize our
campus’ old newspapers.  At the time, we saved our scans as multi-page
TIFF files.  We intended to use these as the raw originals, and then
convert copies of these files to PDF for use by patrons.  With the chaos
of a renovation and other factors, we set things aside, but are now
looking at returning to the project.  

 

I have discovered that we made one particularly challenging mistake in
that some of the newspapers were scanned 2 pages at a time.  This makes
it difficult to actually view the newspaper, since the “pages” of the
digital file do not correspond with the pages of the newspaper, and one
must do a lot of cumbersome scrolling in order to navigate the pages
(particularly the right-hand page).  I am in the process of putting
together a project to split these double-scans into separate pages, and
I have discovered something that has me questioning a fundamental aspect
of how we are storing these files.

 

The whole concept of a multi-page TIFF, it turns out, was primarily used
for faxes, and over the years, as fax machines have become obsolete
technology, so has the multi-page TIFF.  Relatively few software
products even read multi-page TIFFs (they only recognize the first
page), and fewer still will write to them.

 

I’m thinking that I should split each of these files so that we have a
separate image per page (stored in a directory for each issue).  I know
that I should store these in an uncompressed format.  Is TIFF still the
preferred format, or is there a different format that is preferred?  

 

It would be nice to do this as a batch job, but frankly, given the size
of these files, I don’t know whether that is practical on the hardware
available to me.  Just opening one of these files takes several minutes.

 

If anyone has any thoughts on this, I’d appreciate it.  Also, if there
is a list that is more focused on this kind of thing, I’d appreciate
knowing that as well.


Thanks,

 

Keith

 

Keith Engwall

Systems Librarian

Catawba College Library

kengwall at catawba.edu

 




More information about the Web4lib mailing list