[Web4lib] Extracting images from PDF?
Jonathan Gorman
jtgorman at uiuc.edu
Wed Dec 14 16:16:24 EST 2005
Can't say I have any "real world experience". But some programs I'd
look into are xpdf/pdfimages (also pdftohtml which uses these) and
perhaps imagemagick. Not sure what the state of Windows ports of
these programs are.
Do you know what type of images are contained in the pdf file? Are they
actually tiff, jpeg, or something else?
I guess my first approach to the problem would be to process each issue
through the conversion programs, extract the images, and then just copy
the "last" image of each issue. Of course, this depends on the naming
scheme for issues and the like. The person would need a little bit of
knowledge of scripting. Given a reasonable setup it shouldn't be too
difficult.
Sounds like an interesting problem.
Jon Gorman
More information about the Web4lib
mailing list