[Web4lib] Archiving College or Library websites?

Kozlowski,Brendon bkozlowski at sals.edu
Fri Aug 15 14:17:00 EDT 2008


In the past, when I've wanted to archive (or save) a website and its contents, I've used a spider that is technically breaking web-rules to do the job (it ignores robots.txt to copy EVERY page linked to from a website).  One should also be aware that this application will cause considerable bandwidth and if run on a website that is not your own, previous allowance should be requested/given prior to running on a website.  With that being said:
 
http://www.httrack.com/ <http://www.httrack.com/> 
 
Also, with that being said...I'm assuming that you have access to your own website, especially as a "Web Manager", unless you were accessing the site from a CMS in the first place, you should have root access?  If not, you should probably be requesting this information from the department or person in charge of it all to see if you could get a raw copy of all the files, folders (probably as a TGZ file since your server is running Apache under Red Hat Linux), and database exports of the current website, rather than doing a "screen scrape" of the site with HTTrack.
 
 
 
Brendon Kozlowski
Web Administrator
Saratoga Springs Public Library
49 Henry Street
Saratoga Springs, NY, 12866
[518] 584-7860 x217

________________________________


>Susan E. Edwards wrote:
>> Our current library and college website will be transitioning to a CMS
>> (Drupal), with  a radically different architecture and appearance. Do
>> you have any advice on how to document the current iteration (screen
>> shots? Online exhibition?) as a manifestation of how the college (or
>> library) presented itself at this point in time?
>
>Well, I'd keep a copy of the whole thing handy, natch.  If you were
>transitioning from one CMS to another, I'd say see if there were a
>utility that could spit out the pages as plain old html (and hence no
>longer dependent on a database).
>
>As to continuing to make some vestige of the old site public, I dunno --
>isn't that the job of Archive.org?
>
>LEO
>
>P.S. congrads on the new site!




More information about the Web4lib mailing list