[Web4lib] Best Practices for Archival Digitization of multi-page sources

Allison Zhang zhang at wrlc.org
Fri Oct 23 12:17:02 EDT 2009


Keith,

We experienced similar problem as yours several years ago. I had to split hundreds of multi-page TIFF files. I found a shareware to do the job in batch and it worked really well. I don't remember the name of the software. I just searched in Google and found a shareware "TIFF Splitter", which sounds familiar with the software I used. You may want to test it. 

How many double pages do you have? If you have a lot of pages, after you split the multi-page tiff files to individual tiff files, you can do PhotoShop batch jobs to split those double pages if they were scanned as the same size (same dimensions). This may take several steps. You may contact me offline if you are interested. 

The TIFF files created several years ago may not have good quality. We had the problem. I had to do some PhotoShop job to enhance the readability and we had to re-scan over 800 pages. 


Hope this helps.


Allison 



Allison Zhang
Manager, Digital Collections Production Center
Washington Research Library Consortium
zhang at wrlc.org
301-390-2049


________________________________________
From: web4lib-bounces at webjunction.org [web4lib-bounces at webjunction.org] On Behalf Of Keith D. Engwall [kengwall at catawba.edu]
Sent: Friday, October 23, 2009 11:42 AM
To: web4lib
Subject: [Web4lib] Best Practices for Archival Digitization of multi-page       sources

We took on a project at our library several years ago to digitize our
campus’ old newspapers.  At the time, we saved our scans as multi-page
TIFF files.  We intended to use these as the raw originals, and then
convert copies of these files to PDF for use by patrons.  With the chaos
of a renovation and other factors, we set things aside, but are now
looking at returning to the project.



I have discovered that we made one particularly challenging mistake in
that some of the newspapers were scanned 2 pages at a time.  This makes
it difficult to actually view the newspaper, since the “pages” of the
digital file do not correspond with the pages of the newspaper, and one
must do a lot of cumbersome scrolling in order to navigate the pages
(particularly the right-hand page).  I am in the process of putting
together a project to split these double-scans into separate pages, and
I have discovered something that has me questioning a fundamental aspect
of how we are storing these files.



The whole concept of a multi-page TIFF, it turns out, was primarily used
for faxes, and over the years, as fax machines have become obsolete
technology, so has the multi-page TIFF.  Relatively few software
products even read multi-page TIFFs (they only recognize the first
page), and fewer still will write to them.



I’m thinking that I should split each of these files so that we have a
separate image per page (stored in a directory for each issue).  I know
that I should store these in an uncompressed format.  Is TIFF still the
preferred format, or is there a different format that is preferred?



It would be nice to do this as a batch job, but frankly, given the size
of these files, I don’t know whether that is practical on the hardware
available to me.  Just opening one of these files takes several minutes.



If anyone has any thoughts on this, I’d appreciate it.  Also, if there
is a list that is more focused on this kind of thing, I’d appreciate
knowing that as well.


Thanks,



Keith



Keith Engwall

Systems Librarian

Catawba College Library

kengwall at catawba.edu




_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/



More information about the Web4lib mailing list