[WEB4LIB] Cleanup Word docs converted to HTML
Raymond Wood
raywood at magma.ca
Thu Jun 20 10:20:35 EDT 2002
On Thu, Jun 20, 2002 at 06:49:13AM -0700, Dan Lester remarked:
> Many of us are familiar with the vast amounts of style and
> other information that is buried in HTML pages created by
> "Save as HTML" from Word 2000 and related products.
Or what I call 'Save as crack XML' ;-)
> In the past we've used an MS download called something like
> msofhtml.exe that does a pretty decent job of cleaning out the
> formatting and styles from Word 2000. However, it appears
> that program won't work under WinXP or with OfficeXP software.
> It seems to require that Office 2000 be installed on the
> machine.
>
> I've searched the MS site and haven't found a new version. If
> anyone could offer a pointer to a new version of the MS
> cleanup tool, or to some other tool that will do a similar
> job, I'd appreciate it.
A free utility from the W3C called 'Tidy' will clean up goofy
Word 2000 code. There is a GUI version for windos called
'TidyGUI'.
> Meanwhile, I continue to encourage our authors to create pages
> in FrontPage and not in Word, and to cleanup the converted
> pages.
No comment =)
Cheers,
Raymond
More information about the Web4lib
mailing list