[WEB4LIB] Re: Cleaning up WORD HTML

Karen Harker Karen.Harker at UTSouthwestern.edu
Wed Feb 20 09:45:25 EST 2002


Dreamweaver includes an extension that does this quite well and easily.  Simply Import the file as a Word HTML and it will remove all the extraneous HTML, "webbots", and XML.  Now this final thing may or may not be a good thing in the future.  However, I can't remember what schema they use...if it's Dublin Core, then you may not want it removed; if it's their own proprietary schema (which, knowing MS, is probably the case), then it doesn't seem worth having.




Karen R. Harker, MLS
UT Southwestern Medical Library
5323 Harry Hines Blvd.
Dallas, TX  75390-9049
214-648-1698
http://www.swmed.edu/library/

>>> Jim Rible <Rible at sou.edu> 2/19/02 9:47:23 PM >>>
I convert a lot of Word documents and there are two more steps I follow
after the "Clean Up Word HTML".  

2. File >> Convert >> 3.0 Compatiable Browser (then select the "CSS
Styles to HTML Markup")
3. Modify >> Page Properties >> Document Encoding (then change it from
"Windows 1252" to "Western (Latin 1)").

Jim Rible
Systems Librarian
Southern Oregon Univesity
rible at sou.edu 


>>> "Drew, Bill" <drewwe at MORRISVILLE.EDU> 02/05/02 05:48PM >>>

Any suggestions for cleaning up Word documents saved as HTML?  I use
the
Commands within Dreamweaver but the code still looks very dirty but
better
than it was.  Any suggestions other than getting them to use something
else?
We are mounting handouts written originally in Word onto our web.


Wilfred (Bill) Drew
Associate Librarian, Systems and Reference
SUNY Morrisville College Library
E-mail: mailto:drewwe at morrisville.edu 
BillDrew.Net: http://billdrew.net/ 
Not Just Cows: http://people.morrisville.edu/~drewwe/njc/ 
Library: http://library.morrisville.edu 
Wireless Librarian: http://people.morrisville.edu/~drewwe/wireless/ 
SUNY Morrisville College: America's Most Wired 2 Year College - 2001,
2000



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************



More information about the Web4lib mailing list