[WEB4LIB] link checking and google hunting

Wed Jun 12 14:24:56 EDT 2002

On Wed, Jun 12, 2002 at 10:23:18AM -0700, Jim Jacobs remarked:
> I have several thousand links to check and am, naturally, finding a
> high percentage of broken links.  This leads me to two questions, one
> old, one (i think) new: 
> 
>     1. Has anyone found the definitive, works-everytime, always-
>       correct, wouldn't-use-anything-else link checking software? :-)
>       Features I'm interested in would include:
>         - inclusion of <title> of page found in report.
>         - accurate way of dealing with <refresh> tags.
>         - accurate way of dealing with load-balancers that redirect to
>           different, but correct, machine.
>         - easy, accurate way to re-check bad links to verify they are
>           really bad and not just unreachable at the moment of last
>           check.
>         - follows links on existing web site and extracts urls to check 
>           from existing html files.
>         - runs on unix, preferable.
> 
>        I've used, at various times, MOMspider, webxref, and linklint.
> 
>     2. Has anyone experimented with using google or the new google API
>       to track down new URLs for bad links?  Specifically, has anyone
>       integrated a google-search and result-report into a link
>       checker?
> 
> I'll happily accept any advice, condolences, pointers to sources of
> comparison studies, recommendations, etc.

re: #1:

These are worth a look - YMMV:

For *nix:
  linbot
  linkchecker

For *doze:
  xenu link sleuth  <-- this one puts a goofy banner add in the
    HTML report, but otherwise works OK.

HTH,
Raymond