Adobe Content Filter

Earl Young eayoung at bna.com
Fri May 16 10:05:02 EDT 1997


     PDF builds a picture of a document.  It is not "legible" to standard 
     word and string processing software because there are no "words" or 
     "characters" in it.  We successfully convert several different types 
     of PDF files into HTML, SGML, and a couple of other formats, but the 
     process is not pretty.  The interim step is converting PDF to 
     PostScript so that we can turn our Perl scripts loose on it.
     
     It is my understanding that the "indexing" that is found on most sites 
     using PDF is done at the time the document is written.  The PDF 
     software builds the index in the process of its work, and the format 
     was not degisned to let you add to the index in an automated form 
     after the PDF file is created.
     
     Adobe is not an entirely complete source of information about how to 
     deal with PDF.  They told us that we would not be able to convert PDF 
     into ASCII, etc., but we did.  They are used to thinking about PDF as 
     an output format.  They haven't focused on other ways it can be used.
     
     


______________________________ Reply Separator _________________________________
Subject: Adobe Content Filter
Author:  web4lib at library.berkeley.edu at INTERNET
Date:    5/16/97 1:35 AM


     
     
     I am trying to locate "content filter" software that would allow 
     Microsoft Index Server software to index Adobe PDF files. I 
     called Adobe but got nowhere.
     
     Does anyone know if such software exists? 
     
     Please respond to me and not the list. I'll summarize the 
     replies. Thanks.
     
     
     Alan Gale
     University of Guelph
     



More information about the Web4lib mailing list