[WEB4LIB] Cleanup Word docs converted to HTML

Raymond Wood raywood at magma.ca
Thu Jun 20 10:20:35 EDT 2002


On Thu, Jun 20, 2002 at 06:49:13AM -0700, Dan Lester remarked:
> Many of us are familiar with the vast amounts of style and
> other information that is buried in HTML pages created by
> "Save as HTML" from Word 2000 and related products.

Or what I call 'Save as crack XML'  ;-)

> In the past we've used an MS download called something like
> msofhtml.exe that does a pretty decent job of cleaning out the
> formatting and styles from Word 2000.  However, it appears
> that program won't work under WinXP or with OfficeXP software.
> It seems to require that Office 2000 be installed on the
> machine.
> 
> I've searched the MS site and haven't found a new version.  If
> anyone could offer a pointer to a new version of the MS
> cleanup tool, or to some other tool that will do a similar
> job, I'd appreciate it.

A free utility from the W3C called 'Tidy' will clean up goofy
Word 2000 code.  There is a GUI version for windos called
'TidyGUI'.

> Meanwhile, I continue to encourage our authors to create pages
> in FrontPage and not in Word, and to cleanup the converted
> pages.

No comment  =)

Cheers,
Raymond



More information about the Web4lib mailing list