"Archiving" e-journals

Aaron Bradley abradley at bowest.awinc.com
Thu Apr 11 17:04:00 EDT 1996


Bill, Hal, et al.:

I really don't believe that local mirroring of electronic journals
is viable or desirable, for a variety of reasons.   By I'm
referring to a large-scale, large-scope transfer and local
archiving of electronic resources that Bill and Hal seem
to be talking about:  I think that the mirroring of a few, specific
journals is possible and benefitial.

First of all, as Bill and Hal have found, the logistics can be
horrible.  Plain text or binary files (like documents in PostScript)
are fairly straightforward, but once you start dealing with
any sort of complex html files you're bound to have to edit
them manually to clear up the referencing.  And wait until
more sites start using Java:  it will make writing the 
data-fetching robot a nightmare.  And there's the whole
resource allocation problem of dealing with such
huge amounts of data, as Bill has just brought up
in a post as I'm writing this:
>web site.  I started downloading around 10am.  It's now 3:50pm and 
>it's still downloading the same site.  The depth to which it 

Secondly, I think a better use of Web resources  -- both
locally and network wide -- is to work on efficient indexing
and tracking of remote electronic journals.  That is to say
its better to have a large index of Internet journals rather
than a large collection of them.  Indexing is a task that can
be accomplished relatively easily -- at least in comparison
with trying to make large chunks of data site-conformant.
If one is going to engage in that sort of manual labour it's
probably better applied to cataloguing of journals.

Finally, the copyright issue alluded to in the original post
could be an insurmountable hurdle.  True, it's relatively
easy to assume that any journal freely available full-text
wouldn't mind having you mirror them, but you couldn't
be sure:  unless there was a stated release you'd have
to check with individual  producers on a case-by-case
basis.   This is simply untenable for a large number of
journals.

The primary drawbacks of relying on remote sites are
that the site will become temporarily or permanently 
unavailable, or that the resources will move to another
server or directory structure. 

It is in regard to availablity that I think mirroring of
a few specific journals is worthwhile and tenable,
and maybe this is what Bill and Hal were actually
referring to.  Even one mirror site of an electronic
journal assures a lot better chance of access than
one site alone.  A mirror site often provides a
geographically distant server, which enables the
user to select the most efficient site.  You still
have the logistical problems I described earlier,
but you have it on a smaller scale.  If you can't
work out a common format with the information
producer, you can at least write a script, or even
a word-processing macro, that will make the mirrored
date site-compliant:  you can afford to invest some
time in this is you're only dealing with a couple of
journals.  I think that such mirroring arrangements
would work especially well between subject specialist
libraries and electronic information producers in the
library's subject field, being obviously mutually
benefitial.

As to the mutability of pages, both from directory
to directory and site to site,  it's really an issue that
effects all information retrieval on the Internet, and as
such I don't believe that massive mirroring is the
answer.  And there's the simple fact that if you're
going to loose track of it for linking, you're also going
to loose track of it for mirroring.  You could end
up in the dreadful position where one site holds
a complete archive and another only a few issues,
and still another a different selection:  you're never
sure whether you're looking at all the issues of
those titles available.  This has already happened
for quite a few journals.  Once again, I think human
and system resources are better spent in tracking
changes to remote resources, rather than trying
to replicate them.  It's an issue that becomes more
pressing every day in the indexing, and particularly
cataloguing, of academically-oriented electronic
texts.  I think there comes a point where you have
to accept the fact that you can't guarantee the
continued existence of any resource on the 'net,
and that the best thing you can do is try to find
mechanisms for coping with the plastic nature of
your references.

I look forward to continued dicussion of this issue!
Bill, I'm particulary interested in finding out on
exactly what scale you envision mirroring, as
it's not quite clear to me.

Regards,
Aaron Bradley
abradley at bowest.awinc.com
(BTW - http://www.cfcsc.dnd.ca/links/per/pera.html - my
sad list of journals in my subject area)



More information about the Web4lib mailing list