[WEB4LIB] RE: help on capturing links?

Dan Lester dan at riverofdata.com
Mon May 7 15:17:57 EDT 2001


Thursday, May 03, 2001, 6:36:23 PM, you wrote:

ME> We are using proxy server on our public catalog PCs, and also want to
ME> provide access to our subscription databases on these machines.  One of
ME> these databases is CollegeSource, which has links to a very large number of
ME> college web sites.  Is there any software, preferably free or shareware,
ME> that will spider a site and return a text file list of the outbound links on
ME> that site?

I've not seen an answer to this, so I'll offer a couple of comments.

First, I don't see why the proxy server is an issue here.
CollegeSource runs through ours just fine.

Second, though it has links to many college sites, you could also get
those same links from Yahoo or other sources.  Are you trying to build
a page of college links?  Also, the college catalogs themselves appear
to be coming from the collegesource server, not from the institutional
websites.  If they didn't do it that way, they'd not have the control
they need.

Finally, as to the question, there are a host of programs that can
harvest the pages in a website, but I'm not sure that any of them will
do so from an ASP based site like CollegeSource.

dan

-- 
Dan Lester, Data Wrangler  dan at RiverOfData.com
3577 East Pecan, Boise, Idaho  83716-7115 USA
www.riverofdata.com  www.postcard.org  www.gailndan.com 





More information about the Web4lib mailing list