[Web4lib] More fun with google

Thomas Dowling tdowling at ohiolink.edu
Sun Jun 19 17:59:15 EDT 2005


Karen Coyle wrote:

> First, does robots.txt actually *prevent* access? Couldn't someone 
> choose to ignore it? Has anyone ever taken someone to court for 
> ignoring a robots.txt command?



Not only can a poorly behaved crawler choose to ignore robots.txt, it 
could use it to guess what the most sensitive areas of your web site 
are.  Of course, you could also bait it with some bogus directories 
(Disallow: /honeypot) and shut down any sites that request those.  I'm 
sure Google has a lot of tricks up its sleeve to spot and stop crawlers.


-- 
Thomas Dowling
tdowling at ohiolink.edu





More information about the Web4lib mailing list