[Web4lib] Capturing web sites

Mon May 23 04:59:22 EDT 2005

Ditto on HTTrack, I've just recently used HTTrack to backup the entire
WEB site because within the next two weeks I will be moving the servers
into the new library

 ( http://www.library.appstate.edu/newlibrary/ ). 

 My WEB server is Zope and connects to a PostgreSQL database for certain
features.  HTTrack saved dynamically created pages as static pages
although a couple of Electronic Resources pages were missed.  This was
due to timeout settings, not long enough, and number of connections
setting, too high.  I burned these files to DVD so they could be mounted
on a server in another building and use that server during the move to
display our site or as much as possible.

Another feature is that you can add a footer for the whole site.  I
inserted a footer saying that some forms would not work properly on the
temporary server.  The forms won't work because they need to connect to
the database server which is on the same machine as the WEB server.

Thomas

On Thu, 2005-05-19 at 18:23, Eric Gustafson wrote:
> The hard part as I see it is to maintain a link/trail between web based documents and the others you collect.  The only possible method I can think of offhand would be to convert everything into a paperless document management system such as Laserfiche (www.laserfiche.com).  
> For simply creating a local copy of webpages (aka offline browsing), I use httrack (http://www.httrack.com/) - it's free and there are flavors for various OS'.  Httrack (and most other software of this nature) leaves you with a copy that has the links re-written so a 'true' copy is not really available.  What it does do is allow you to store/transfer the pages wherever your wish (ie: burn a cd) and then view the pages using your web browser.  Other packages available don't leave you with a slew of html files and gifs * they will create a database of the pages.  The catch with those is that you usually need a special viewer.
> 
> I hope that's not too much drivel. <grin>
> Eric
> 
> 
> Eric Gustafson, Computer Support Technician
> Library, Lane Community College
> 4000 E. 30th Ave
> Eugene, OR 97405
> gustafsone at lanecc.edu
> 541.463.5277
> http://www.lanecc.edu/library/
> 
> >>> Catherine Buck Morgan <catherine at leo.scsl.state.sc.us> 05/19/2005 9:11:56 AM >>>
> (Please excuse the cross-posting.)
> 
> We are a state documents depository, collecting annual reports, 
> directories, and other kinds of documents produced by the various 
> agencies in SC. As you're aware, many of these documents are now 
> published electronically.
> 
> Some documents are published only as an html website (including 
> directories and annual reports). Our problem is how to capture that and 
> store it so it can be accessed down the road. (At this point, I'm not 
> concerned with accessing it in the year 2038, just capturing it now.)
> 
> How are other libraries handling this? Are there software recommendations?
> 
> Thanks,
> Catherine.
> -- 
> 
> Catherine Buck Morgan
> Director, Information Technology Services
> South Carolina State Library
> EMAIL: catherine at leo.scsl.state.sc.us 
> Phone: 803.734.8651 Fax: 803.734.4757
> Home page: http://www.statelibrary.sc.gov
> Web catalog: http://www.statelibrary.sc.gov/scslweb/welcome.html
> E-Rate info: http://www.statelibrary.sc.gov/erate.html
> 
> Systems librarianship is the art and science of combining the principles 
> of librarianship with the abilities of computing technology. --Eric 
> Lease Morgan
> 
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org 
> http://lists.webjunction.org/web4lib/
> 
> >>> Catherine Buck Morgan <catherine at leo.scsl.state.sc.us> 05/19/2005 9:11:56 AM >>>
> (Please excuse the cross-posting.)
> 
> We are a state documents depository, collecting annual reports, 
> directories, and other kinds of documents produced by the various 
> agencies in SC. As you're aware, many of these documents are now 
> published electronically.
> 
> Some documents are published only as an html website (including 
> directories and annual reports). Our problem is how to capture that and 
> store it so it can be accessed down the road. (At this point, I'm not 
> concerned with accessing it in the year 2038, just capturing it now.)
> 
> How are other libraries handling this? Are there software recommendations?
> 
> Thanks,
> Catherine.