Attempts to access robots.txt
Carlos I McEvilly
cim at c3serve.c3.lanl.gov
Thu Sep 21 17:36:58 EDT 1995
On Thu, 21 Sep 1995, Web4Lib Moderator wrote:
> On another note: my access log notes about a dozen attempts to
> retrieve a file called robots.txt from the root directory of the
> server. Such a file has never existed here, and my colleagues who
These access attempts are coming from well-behaved web
spiders or robots that look for this file when they
first contact a server. If they find the file, they
restrict their own access based on the instructions
in the file. A typical robots.txt file can be found
at http://www.c3.lanl.gov:8075/robotExclude.html
If you don't mind if these indexing engines index the
pages on your server (normally the case) then there is
no reason to have a robots.txt file. You'll still
notice periodic access attempts in the logs, as lycos
and other search robots will check for the file before
reindexing your site. The only time you might want
to have a robots.txt file is if you have thousands of
documents and are afraid that having a spider walk
through them all would bog down your server, or if
you want to prevent well-behaved robots from accessing
certain URLs that you don't want indexed.
More details are at:
http://web.nexor.co.uk/mak/doc/robots/norobots.html
Carlos McEvilly
cim at lanl.gov
More information about the Web4lib
mailing list