[WEB4LIB] checking link content

Fri Dec 7 15:30:33 EST 2001

At 01:48 PM 12/7/2001, Reed, Tracey wrote:
>Happy Holiday Greetings to all!
>
>I've been thinking about all the sites that have been, for lack of a better
>word, hijacked by the adult entertainment industry.  As I've read in a
>couple of places recently, it's not enough (was it ever?) to check that your
>links GO somewhere, but also that they are still actually the site you think
>they are. This can be very labor intensive as it seems to me that you have
>to manually check each link on your site.  This, for many, is a HUGE
>proposition.

I don't know what commercial link checkers are capable of, but it strikes 
me that porn sites *want* to be found via search engines, and that it would 
be simple to scan returned pages for a handful of, um, characteristic 
phrases that are likely to be on a porn page and very unlikely to be on 
other pages.  A link checker could easily flag such pages for manual checking.

A related problem is when well meaning sites move, create a page at the old 
site pointing to the new one, and leave it there for years, happily 
returning a good 200 status to all link checkers.  Of course, a checker 
could grep for "this page has moved" also, but the phrasing there is a 
little less predictable.

Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu