SGML for Web Pages

Heinrich C. Kuhn kuhn at mpg-gv.mpg.de
Tue Dec 19 19:10:30 EST 1995


I proposed:
> >>And we will have to rely on ink on stable paper
> >>for a *very* long time yet. The only sensible
> >>solution, that I can think about for longterm
> >>archiving "electronic" documents is having a
> >>paper printout of all the documents in a small
> >>number (e.g. one or two on every continent)
> >>of archiving libraries.

Tony Barry objected (not without good reason):
> >You cant print dynamic documents and you can't sensibly try and print
> >douments built into a hypertext matrix and hope to make sense of them.

And Keith Engwall made very reasonable remarks: 
> I agree, to a point.  Electronic documents that are designed to be dynamic
> or part of a whole organism (a hypertextual manuscript) cannot be captured
> properly on paper.  For instance, what do you capture?  The code?  That is
> difficult to interpret and would have to be hand-entered before it could be
> seen as it should be.  The output?  At what time?  We now have electronic
> documents that change as we watch them... we would have to choose an
> arbitrary moment in time to represent this document for archival purposes.
> That is not satisfactory.

It is not satisfactory indeed. But I was thinking about
archiving for centuries and not about archiving for years.
Magnetic tape needs to be respun quite often to avoid 
loosing legibility by cross-magnetization within the tape.
And it needs to be copied from one type of tape to annother
as the older types of tape-readers become obsolete. 
I'm a bit pessimistic whether this planet and especial
its intelligent inhabitantship is in a shape to permit
us to be certain, that there won't be any future gaps in caring
for precious magnetic tapes that might render their content
illegible. Magnetic discs present a rather similar problem.
The analysis by a collegue of mine had the result, that
normal CD-ROMs when stored in a non-optimal manner pro-
bably will become unreadable in some 25 years. The highest
claim I ever learned about concerning the durability of
CD-ROMs was 100 years (with non-standard facing materials
on a glass carrier); I remain a bit sceptic ... . Good ink 
on good paper gives us several centuries, and microforms at 
least give us at least 100 years. 
   I never intended the archiving-by-printouts-on-paper-
approach to render something that is easily legible by humans.
But it renders something that is on a rather stable carrier
and can be interpreted by some sort of machinery even in
some rather far future that had to experience some gap of
caring for such material between our time an theirs.
I propose paper hust as a backup-medium.
   Of course it is rather nonsensical to strive to
document all changes in fast changing documents, but
having a snapshot of the more important of these documents:
why not. And: A lot of documents we would like to
archive in this way tend to become "stable" after some
time.
   Hyperlinks are annother problem, but if there would
be a certain small number of central places collecting 
the printouts of the documents in question they might
be able to serve for a recontruction of quite some part
of the web of such documents.
   All this is not satisfactory. But to much optimism
in the stability of present day technical status might
be nocious to *longterm* preservation.


I originally spoke:
> >>I admit that it is not too improbable, that interpreting
> >>GIF- and JPEG-encoded graphics will be a problem after
> >>a quarter of a century,

Tony Barry objected:
> >
> >Not so.  Formats with open standards will have no problem with
> >interpretation.  Its proprietry standards which are not public which might
> >die.

And Keith Engwall wrote on this: 
> Again, I agree... to a point.  Formats with obsolete standards (open or
> proprietary) may easily be lost.  This is far less likely with open
> standards, but it is not outside of the realm of possibility (or even
> likelihood).  

My crystal ball doesn't permit looks far enough into the
future to be 100% confident about the survival of all
non-propietary formats.

I originally spoke:
> >>but I see no such problems with
> >>HTML, as HTML is basically ASCII and ASCII will be readable

Tony Barry objected:
> >HTML just uses _readable_ ascii.  Just because a binary object might use
> >ASCII which does not map directly to printable characters makes no
> >difference in its ability to be interpreted.

And Keith Engwall wrote on this: 
> This is probably the most important point.  Just because we cannot read
> barcodes in grocery stores does not mean that information is not there.
> Similarly, just because non-ascii encoding is not readable without
> interpretation software does not mean it is not readable.  Even image files
> can be scanned for ascii by Optical Character Recognition software.  So
> long as the data is not corrupt and the proper interpreter is used, data
> can be translated from any format to one that is readable, editable, etc.
> ASCII is really no different (it's just that the interpretation occurs for
> us automatically).

Both of you certainly have a point. But I'd still say, that
when you use ink on paper for your backups ASCII (and thus HTML)
might give a better chance for preservation.

   Now I hope, that my remarks have rendered someway 
better intelligible what were the reasons for what I 
spoke originally, and do not convey the impression 
that I'm responding just out of stubbornness.

Regards

Heinrich C. Kuhn
****************************************************
*  Dr. Heinrich C. Kuhn   (coordinator libraries)
*  Max-Planck-Gesellschaft / Generalverwaltung IIb3
*  Postfach 10 10 62
*  D-80084 Muenchen
*
*  voice: +49-89-2108 1563
*  fax:   +49-89-2108 1565
*  eMail: hck at ipp-garching.mpg.de   or
*          kuhn at mpg-gv.mpg.de
******************************************************
  


More information about the Web4lib mailing list