[WEB4LIB] checking link content
Thomas Dowling
tdowling at ohiolink.edu
Fri Dec 7 15:30:33 EST 2001
At 01:48 PM 12/7/2001, Reed, Tracey wrote:
>Happy Holiday Greetings to all!
>
>I've been thinking about all the sites that have been, for lack of a better
>word, hijacked by the adult entertainment industry. As I've read in a
>couple of places recently, it's not enough (was it ever?) to check that your
>links GO somewhere, but also that they are still actually the site you think
>they are. This can be very labor intensive as it seems to me that you have
>to manually check each link on your site. This, for many, is a HUGE
>proposition.
I don't know what commercial link checkers are capable of, but it strikes
me that porn sites *want* to be found via search engines, and that it would
be simple to scan returned pages for a handful of, um, characteristic
phrases that are likely to be on a porn page and very unlikely to be on
other pages. A link checker could easily flag such pages for manual checking.
A related problem is when well meaning sites move, create a page at the old
site pointing to the new one, and leave it there for years, happily
returning a good 200 status to all link checkers. Of course, a checker
could grep for "this page has moved" also, but the phrasing there is a
little less predictable.
Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu
More information about the Web4lib
mailing list