[Web4lib] Capturing web sites

Raymond Wood raywood at magma.ca
Thu May 19 14:13:02 EDT 2005


On Thu, May 19, 2005 at 01:20:39PM -0400, Doug Payne wrote:
> Ellen McCullough wrote:
> > Catherine Buck Morgan ; Sent: Thursday, May 19, 2005 12:12 PM
> > > We are a state documents depository, collecting annual reports,
> > > directories, and other kinds of documents produced by the various
> > > agencies in SC. As you're aware, many of these documents are now
> > > published electronically.
> > >
> > > Some documents are published only as an html website (including
> > > directories and annual reports). Our problem is how to capture
> > > that and store it so it can be accessed down the road. (At this
> > > point, I'm not concerned with accessing it in the year 2038, just
> > > capturing it now.)
> > >
> > > How are other libraries handling this? Are there software
> > > recommendations?
> > >
> > > Thanks,
> > > Catherine.
> > 
> > Catherine, 
> > If you're asking about capturing discrete pages of Web sites, I have
> > a good software program called SnagIt
> > (http://www.techsmith.com/products/snagit/default.asp). It cost
> > about $50.00 and I use it to capture Web pages, sections of Web
> > pages, etc.  There's a feature you can use to capture a scrolling
> > Web page as well, so you get the page in its entirety (vs. just the
> > visible window).  You can save the files in a number of different
> > formats (PNG, jpeg, et al.) I have found it very useful!  By the
> > way, I have no professional affiliation with SnagIt or its parent
> > company, TechSmith!
> > Thank you, 
> > Ellen
>
> Hi Catherine,
> 
> Another tool for capturing web sites is WebWhacker by BlueSquirrel
> Software.
> 
> WebWhacker is at:
> http://www.bluesquirrel.com/products/webwhacker/
> 
> Doug Payne

A cross-platform, free software/open source tool with similar
functionality is 'HTTrack Website Copier':
   http://gnuwin.epfl.ch/apps/httrack/en/index.html

Raymond
-- 
"Be Nice, or Leave - By Order of the Management"
(Sign above door, Black Sheep Inn, Wakefield)
GPG Fingerprint: 2E4D 8605 DD48 E80F F893  1C02 B65D 86D9 3B3C 0E03
Encrypted E-mail Preferred
Bush-whacked 2004! Try to relax and enjoy the Chaos :-)


More information about the Web4lib mailing list