HTML->unformatted ascii text converters?

Sean Dreilinger sean at kensho.com
Thu Dec 5 15:15:47 EST 1996


Ola Tony & Web4Lib:

>Does anyone know of any software out there (including the source code -
>in C preferably) that takes HTML documents and outputs straight unformatted
>ASCII text (without any of the HTML tags).  If it includes any of the 
>rudimentary formatting from the docuement (like centering), that would
>be great.

Have you considered mixing in LaTeX as an intermediate step? There are
scriptable utilities to get you from HTML to LaTeX, and from LaTeX files
you can take your pick of document formats you'd like to ultimately
output-- including lightly formatted ASCII text. You can scope a selection
of these converters from a link page I like:

	<http://www.loria.fr/tex/english/outils.html>

Or look up a CTAN site, UCOP is up in Northern California by Walnut Creek,
so maybe under macros/latex/ at this URL?

	<ftp://ftp.cdrom.com/pub/tex/ctan/>

I went through library school without ever hearing of LaTeX, so here's an
executive summary in case other list-members are interested: LaTeX is a
FREE document preparation system popular in the academic and scientific
community that performs typesetting above and beyond the average GUI word
processor, and handles the presentation of obscure equations and
micro-layout decisions in an intelligent way. You can grab free
distributions of LaTeX for MacOS, MS-Win, MS-DOS, UN*X, and other platforms
at the CTAN URL above under systems/.

Happy Surfing!
--Sean :-)


                                          Sean Dreilinger, MLIS
        PGP Public Key - http://www.kensho.com/sean/pubring.htm
  sean at kensho.com - 619.514.3939 - http://www.kensho.com/~sean/
KENSHO - Bringing Knowledge to the Information Age - in a Flash


More information about the Web4lib mailing list