[Web4lib] More fun with google
Thomas Dowling
tdowling at ohiolink.edu
Sun Jun 19 17:59:15 EDT 2005
Karen Coyle wrote:
> First, does robots.txt actually *prevent* access? Couldn't someone
> choose to ignore it? Has anyone ever taken someone to court for
> ignoring a robots.txt command?
Not only can a poorly behaved crawler choose to ignore robots.txt, it
could use it to guess what the most sensitive areas of your web site
are. Of course, you could also bait it with some bogus directories
(Disallow: /honeypot) and shut down any sites that request those. I'm
sure Google has a lot of tricks up its sleeve to spot and stop crawlers.
--
Thomas Dowling
tdowling at ohiolink.edu
More information about the Web4lib
mailing list