HTML->unformatted ascii text converters?

Ian Tilsed I.J.Tilsed at exeter.ac.uk
Thu Dec 5 11:02:18 EST 1996


On Wed, 4 Dec 1996 15:44:53 -0800 Anthony Toyofuku wrote:

> Does anyone know of any software out there (including the source
> code - in C preferably) that takes HTML documents and outputs
> straight unformatted ASCII text (without any of the HTML tags).  If
> it includes any of the rudimentary formatting from the docuement
> (like centering), that would be great.

In addition to what has been mentioned already, I sometimes use a 
macro in MS Word that strips the HTML coding from text.  I read about 
it somewhere, although the exact reference escapes me.  The macro text 
is:

Sub MAIN
EditFind .Find = "\<*\>", .Direction = 0, .MatchCase = 0, .WholeWord = 
0, .PatternMatch = 1, .SoundsLike = 0, .Format = 0, .Wrap = 2
StartOfDocument
While EditFindFound()
 RepeatFind
 EditClear
Wend
End Sub


I repeat that I take no credit for the macro - I am just forwarding it 
on.  I hope that it is of some use.

Regards,
Ian Tilsed
--
---------------------------------------------------------------------
Ian Tilsed       	      		          Tel: (01392) 263876
Computing Development Officer (Library)	          Fax: (01392) 263871
University of Exeter UK     E-mail (MIME OK): i.j.tilsed at exeter.ac.uk
	            http://www.ex.ac.uk/~ijtilsed/
---------------------------------------------------------------------





More information about the Web4lib mailing list