processing a web site for paper-based format: INFO NEEDED

Prentiss Riddle riddle at is.rice.edu
Wed Feb 21 22:22:12 EST 1996


> Date: Wed, 21 Feb 1996 07:39:56 -0800
> From: Laura Guy <guy at dpls.dacc.wisc.edu>
> Subject: processing a web site for paper-based format: INFO NEEDED
> 
> i've developed a very large (currently about 40 meg) and 
> complex web site (loads of internal and external links) of
> international data resources (http://dpls.dacc.wisc.edu/iassist/)
> 
> a colleague of mine who is doing work in South Africa asked how
> difficult it would be to move the entire web-site into some sort of
> format (perhaps like postscript) so it could be distributed to 
> those less fortunate resource-wise (many researchers in South Africa
> do not have the computer resources necessary to use our web site).
> 
> i know that one thing i could do would be to sit for hours and
> create individial postscript files from these hundreds of html
> documents, but certainly any hyperlinking would be problematic
> (i guess id have to carefully sort these postscript files and print
> them in order) AND of course, any of our links to outside resources
> would be lost.

Splitting the problem into two pieces:

(1) To avoid losing your hyperlinks in hard copy, find a friendly
    hacker to write a short perl script to convert every instance of:

	<a href=""http://blahblah">Blah Blah</a>

    ...to:

	<a href=""http://blahblah">Blah Blah (<tt>http://blahblah</tt>)</a>

    No doubt you'll think of other enhancements to make your HTML more
    hard-copy friendly: explicitly state the URL of each document at
    the beginning, explicitly flag internal <a name="blah"> anchors,
    etc.

    If there are crucial external documents that you've linked to,
    include them in the printed collection (after asking permission or
    assessing them for Fair Use, of course).

(2) Find a web browser which will convert from HTML to something
    printable using command-line options so you can run through all of
    your HTML files in "batch" mode.  I don't think Netscape will do
    this, but possibly some other GUI Postscript-capable browser will;
    if not, you can always turn your HTML into ASCII with lynx:

	foreach F ( /wherever/iassist/*.html )
	    lynx -dump file://localhost/${F} | lpr
	end

    Note that instead of printing directly you could import every
    document back a good word processor or desktop publishing tool and
    use it to compile an index mapping URLs to page numbers.

Good luck.  (I also liked the suggestion that you cut a CD-ROM, but
your target audience seems to require paper.)

And lest people think that these techniques are strictly for a
computerless third-world environment, it seems to me that if refined
they could be useful for archival purposes as well.  I've toyed with
the idea of creating an annual paper snapshot of my CWIS for
preservation in our university library.

-- Prentiss Riddle ("aprendiz de todo, maestro de nada") riddle at rice.edu
-- RiceInfo Administrator, Rice University / http://is.rice.edu/~riddle
-- Home office: 2002-A Guadalupe St. #285, Austin, TX 78705 / 512-323-0708


More information about the Web4lib mailing list