[WEB4LIB] link checking and google hunting
Raymond Wood
raywood at magma.ca
Wed Jun 12 14:24:56 EDT 2002
On Wed, Jun 12, 2002 at 10:23:18AM -0700, Jim Jacobs remarked:
> I have several thousand links to check and am, naturally, finding a
> high percentage of broken links. This leads me to two questions, one
> old, one (i think) new:
>
> 1. Has anyone found the definitive, works-everytime, always-
> correct, wouldn't-use-anything-else link checking software? :-)
> Features I'm interested in would include:
> - inclusion of <title> of page found in report.
> - accurate way of dealing with <refresh> tags.
> - accurate way of dealing with load-balancers that redirect to
> different, but correct, machine.
> - easy, accurate way to re-check bad links to verify they are
> really bad and not just unreachable at the moment of last
> check.
> - follows links on existing web site and extracts urls to check
> from existing html files.
> - runs on unix, preferable.
>
> I've used, at various times, MOMspider, webxref, and linklint.
>
> 2. Has anyone experimented with using google or the new google API
> to track down new URLs for bad links? Specifically, has anyone
> integrated a google-search and result-report into a link
> checker?
>
> I'll happily accept any advice, condolences, pointers to sources of
> comparison studies, recommendations, etc.
re: #1:
These are worth a look - YMMV:
For *nix:
linbot
linkchecker
For *doze:
xenu link sleuth <-- this one puts a goofy banner add in the
HTML report, but otherwise works OK.
HTH,
Raymond
More information about the Web4lib
mailing list