[Web4lib] Link Checking Services

Tue Jan 30 11:33:17 EST 2007

> -----Original Message-----
> From: web4lib-bounces at webjunction.org
> [mailto:web4lib-bounces at webjunction.org] On Behalf Of Valerie Reid
> Sent: Tuesday, January 30, 2007 8:24 AM
> To: web4lib at webjunction.org
> Subject: [Web4lib] Link Checking Services
> 
> We currently subscribe to a link-checking service which checks all the
> links
> on our library's web site and sends me a report on a weekly basis...

Without knowing any of the products that have been mentioned so far, I
will point out one thing to look for.  When I've played with link
checkers in the past, they were all pretty reliable about finding
"bad" links - that is, links that when requeted return something other
than a 200 ("OK") status code.  Unfortunately, there are a lot of
sloppy/clueless/too-clever-by-half webmasters out there, and I have
not seen link checkers that cope well with these situations.

  - Getting a 200 status page that actually says either "Page Not
    Found" or "Page has moved <a href='foo.html'>here</a>".  With that
    200 status, this won't be reported as a broken link.

  - Getting a 200 from a page that uses Javascript to redirect you to
    an updated URL.  Again, this won't show up in the report, and the
    real URL may not be checked.

  - Getting a 200 page that says "This domain name is for sale - want
    some herbal supplements?"

  - Getting a 302 (temporary redirect) to a page that says "Page Not
    Found".

  - Getting a 302 to a page that says "You must enable cookies to
    appreciate our wonderful site"

  - Getting a 302 from http://site/real.page to
    http://site/real.page?session=temporary_id.

  - Getting a 500 ("server error"), 401 ("unauthorized") or 403
    ("forbidden") because some misguided bit of browser sniffing balks
    at talking to your link checker.

Is there anything out there that addresses any of these issues in any
way?

-- 
Thomas Dowling
tdowling at ohiolink.edu