[WEB4LIB] keyword searchable PDF files

Thu Jan 13 17:13:40 EST 2000

Barbara

It does not seem to be commonly known that PDF files come in three different
varieties: Normal, Image, and Image + Hidden Text. Only with the last of
these is the text accessible, i.e. searchable and able to be highlighted,
cut and pasted.

Files of each type have the same .pdf file extension and look the same when
displayed, so it is not easy to tell them apart. The only way I know of to
see if a PDF file is in Image + Hidden Text format is to try to highlight
the text using the Text Select Tool when viewing the file in Acrobat Reader.
If you cannot do this, then the text is not searchable.

Alan Wilson
Director, Information Resources Management
Department of the Parliamentary Library
Parliament House, Canberra, ACT 2600, Australia
--------------------------------------------
Phone: (02) 6277 2570  Fax: (02) 6277 2622
Email: Alan.Wilson at aph.gov.au
DPL website: http://www.aph.gov.au/library/ 

-----Original Message-----
From: Barbara Stewart [mailto:stew at library.umass.edu]
Sent: Friday, January 14, 2000 1:15 AM
To: Multiple recipients of list
Subject: [WEB4LIB] keyword searchable PDF files

My Dept. is looking into scanning tables of contents, which are then
reformatted to comply with MARC tag 505 (we are an Innopac library). I
have been dismayed at the amount of labor this entails - especially when
toc's are printed all in caps, or have some unusual numbering system. I
believe the Library of Congress did a study a few years back with selected
business titles. They attached a PDF file of the table of contents, and
added a clickable note in the 505 field Table of contents - or something
to that effect. My question is: Is there a way to make scanned PDF files
keyword searchable?

Thanks,

Barbara Stewart
Latin American Cataloger
University of Massachusetts Amherst
Amherst, MA 01003
stew at library.umass.edu