[WEB4LIB] Cleanup Word docs converted to HTML

Araby Greene araby at unr.edu
Thu Jun 20 10:29:00 EDT 2002


The HTML filter seems to be built-in to Office/Word XP.

>From the File menu, do not select Save as Web Page
Instead:
Save As [new filename]
In the Save as Type box, click Web Page, Filtered.
Ignore the warning....

The filter doesn't get rid of all the added junk, but creates a file about
half the size of "Save as Web Page." HTML Tidy will get rid of the rest.
When somebody sends me a Word file, I've had pretty good results by opening
it in Word and saving as rich text, which can be inserted into an empty
FrontPage page with minimal cleanup afterwards.

The worst problem I have with Word docs is that people tend to use their
"favorite" fonts instead of standard web fonts. Usually have to remove all
formatting and start over with styles, etc.

-araby greene

_________________________________
Araby Greene
Web Development Librarian
Getchell Library/322
University of Nevada, Reno
araby at unr.edu
http://www.library.unr.edu/

----- Original Message -----
From: "Dan Lester" <dan at riverofdata.com>
To: "Multiple recipients of list" <web4lib at webjunction.org>
Sent: Thursday, June 20, 2002 6:52 AM
Subject: [WEB4LIB] Cleanup Word docs converted to HTML


> Many of us are familiar with the vast amounts of style and other
> information that is buried in HTML pages created by "Save as HTML"
> from Word 2000 and related products.
>
> In the past we've used an MS download called something like
> msofhtml.exe that does a pretty decent job of cleaning out the
> formatting and styles from Word 2000.  However, it appears that
> program won't work under WinXP or with OfficeXP software.  It seems to
> require that Office 2000 be installed on the machine.
>
> I've searched the MS site and haven't found a new version.  If anyone
> could offer a pointer to a new version of the MS cleanup tool, or to
> some other tool that will do a similar job, I'd appreciate it.
>
> Meanwhile, I continue to encourage our authors to create pages in
> FrontPage and not in Word, and to cleanup the converted pages.
>
> cheers
>
> dan
>
>
> --
> Dan Lester, Data Wrangler  dan at RiverOfData.com 208-283-7711
> 3577 East Pecan, Boise, Idaho  83716-7115  USA
> www.riverofdata.com  www.gailndan.com  Stop Global Whining!
> Be competitive, intense, and accountable.
>
>





More information about the Web4lib mailing list