Text extraction from pdf files.

Tony Parsons tony.parsons at racgp.org.au
Wed Sep 5 21:55:18 EDT 2001


Dear all,

This is only a vaguely web-related question, as we'd be using email to
disseminate the information once this problem is solved. Hopefully it's not
too inappropriate, I'm just not aware, yet, of many library computing-type
lists.

Does anyone know how to extract plain text from a pdf file? I have scanned
some documents with Adobe exchange, which we would like to manipulate into
text. I've done a bit of hunting around with not much luck, as far as
conversion software is concerned. Ghostview/Ghostscript progams seems to
extract no text with the pdf's I've scanned.

Should I just give up and organise a different method of scanning, or is
there a *reasonably* straightforward way of doing this?

Regards
Tony.
--
Tony Parsons
Technical Services Librarian
Royal Australian College of General Practitioners - Resource Centre
Ph  (03) 9214 1487
Fax (03) 9214 1403
http://www.racgp.org.au
--
PRIVATE & CONFIDENTIAL
***********************************************************************
The information contained in this e-mail and their attached files, including
replies and forwarded copies, are confidential and intended solely for the
addressee(s) and may be legally privileged or prohibited from disclosure and
unauthorised use.
If you are not the intended recipient, any form of reproduction,
dissemination, copying, disclosure, modification, distribution and/or
publication or any action taken or omitted to be taken in reliance upon this
message or its attachments is prohibited.

All liability for viruses is excluded to the fullest extent permitted by
law.
***********************************************************************



More information about the Web4lib mailing list