Adobe Content Filter
Earl Young
eayoung at bna.com
Fri May 16 10:05:02 EDT 1997
PDF builds a picture of a document. It is not "legible" to standard
word and string processing software because there are no "words" or
"characters" in it. We successfully convert several different types
of PDF files into HTML, SGML, and a couple of other formats, but the
process is not pretty. The interim step is converting PDF to
PostScript so that we can turn our Perl scripts loose on it.
It is my understanding that the "indexing" that is found on most sites
using PDF is done at the time the document is written. The PDF
software builds the index in the process of its work, and the format
was not degisned to let you add to the index in an automated form
after the PDF file is created.
Adobe is not an entirely complete source of information about how to
deal with PDF. They told us that we would not be able to convert PDF
into ASCII, etc., but we did. They are used to thinking about PDF as
an output format. They haven't focused on other ways it can be used.
______________________________ Reply Separator _________________________________
Subject: Adobe Content Filter
Author: web4lib at library.berkeley.edu at INTERNET
Date: 5/16/97 1:35 AM
I am trying to locate "content filter" software that would allow
Microsoft Index Server software to index Adobe PDF files. I
called Adobe but got nowhere.
Does anyone know if such software exists?
Please respond to me and not the list. I'll summarize the
replies. Thanks.
Alan Gale
University of Guelph
More information about the Web4lib
mailing list