Conversion of many documents to html

R124C41 at aol.com R124C41 at aol.com
Fri Dec 22 02:18:20 EST 1995


In regard to the question of Susan A. Kaisaki of the 
Cedars-Sinai Medical Center (kaisaki at csmc.edu) concerning the conversion of
many documents to html, a lot depends on the specific situation.  The
situation described is:

>Our institution will be converting large amounts of 
>documentation (policies, manuals, etc) to HTML on a web 
>server for internal use only. We are looking into using Word's 
>Internet Assistant as the editor. These documents will be 
>edited on an annual basis. We want to do the annual editing in 
>Word.
>
>Do any of you have a similar situation and have any 
>suggestions on the most efficient way to handle this? 

One really has to think through one's specific situation with the following Q
& A in mind.

Q1.  what you mean by "many," 

Q2.  how quickly you have to get them done, 

Q3.  what your criteria is for adequate conversion, and

Q4.  what you can afford.

If A1 < a few hundred, A2 < a few days to get the job done, A3 is readable
but not perfect, and A4 = free, then, my vote is for the freeware product
called rtftohtml (ftp from ftp.cray.com, I believe).  Internet Assistant may
be just as adequate.  If so, substitute Internet Assistant in the following
everywhere rtftohtml appears.

On the Mac, anyway,  rtftohtml supports multiple file "drag and drop" so you
can select all the documents you need converted by Open-Apple-A or whatever,
drag them all over to the rtftohtml application and then go home while it
converts them all one at a time--including making pict files out of any
figures, inserting hypertext links for those figures and so on.
The next day you come back, process the pict files into gif with GifConverter
and you are done--provided you are willing to accept the html it turns out.
 Note that the Pict->Gif will be a bit of a grind because Gifconverter
doesn't quite support "drag and drop" totally...

The html isn't bad--in fact it is very good, I think.

But it may not come up to publication quality.
--------------------------------------------
If your needs are for publication quality.  That is, your needs are most
constrained by Q3/A3, namely, you need the very best conversion.  If this is
the case and you are going to have to do this over and over again every year,
then you've got a much bigger problem.  

It is my impression that when the set of documents is very large (>severl
thousand) and the constraints on end quality very high, the result is
generally that the authoring is moved into some sort of SGML context where
you have much more control over the way production of the documents is going
to occur.   That is, people give up Word and move to an SGML authoring tool.
 I have seen this mainly in defense department applications (manuals for
fighter aircraft, etc.)

I am not sure this is justified here nor is it likely to be accepted by the
current people producing documents.
-----------------------------------------
If that is the case and your numbers of documents are not that great, then
your best bet may be trying to define some style sheets so that rtftohtml can
do a better job of translating things.  If in addition you are able to
constrain the authors in what styles they adopt, then you may be fairly
successful in achieving decent quality of web presentation.

------------------------------------------------------

Frankly, the way to really get the best is to devolve the web (html) document
preparation onto the original authors so that they are responsible for both
the print (word) version and the web version.  Give them the tools, such as
rtftohtml, or Internet Assistant, teach them how to preview the web version
with Open Local on their browser and make it their problem--not yours--to
produce a web-ready version which you as editor then assemble into a finished
web arrangement.

The reason the above approach is nice is that it may occur over time that the
web document becomes the primary document and the word document the derived
one--particularly if your authors begin to make use of hypertext linking.
-----------------
Finally, you ask...

>Does anyone have any experience with any type of document >management
software?

Among other activities at my work, I have spent the last two or so years
looking at some 30 or so vendor document management products.  We have
purchased a product called MATRIX from a company called ADRA Systems.  It is
nicely cross-platform on a lot of unix boxes as well as Windows, WNT, and
(soon, it is claimed) Mac.  It is object-oriented, very flexible, looks like
it can be handled by non-technical people and so on.

In a couple of months, I should know whether or not it was the correct choice
for us.  I will be quite happy to share the result with the list (if the
result isn't what I expect, I may be very interested in the posted openings
:-}).

--David Ritchie
--Naperville, IL
--R124C41 at AOL.COM


More information about the Web4lib mailing list