[Web4lib] More fun with google

Sun Jun 19 17:59:15 EDT 2005

Karen Coyle wrote:

> First, does robots.txt actually *prevent* access? Couldn't someone 
> choose to ignore it? Has anyone ever taken someone to court for 
> ignoring a robots.txt command?

Not only can a poorly behaved crawler choose to ignore robots.txt, it 
could use it to guess what the most sensitive areas of your web site 
are.  Of course, you could also bait it with some bogus directories 
(Disallow: /honeypot) and shut down any sites that request those.  I'm 
sure Google has a lot of tricks up its sleeve to spot and stop crawlers.

-- 
Thomas Dowling
tdowling at ohiolink.edu