Attempts to access robots.txt

Thu Sep 21 17:36:58 EDT 1995

On Thu, 21 Sep 1995, Web4Lib Moderator wrote:
> On another note: my access log notes about a dozen attempts to 
> retrieve a file called robots.txt from the root directory of the 
> server. Such a file has never existed here, and my colleagues who 

These access attempts are coming from well-behaved web
spiders or robots that look for this file when they
first contact a server.  If they find the file, they
restrict their own access based on the instructions
in the file.  A typical robots.txt file can be found 
at http://www.c3.lanl.gov:8075/robotExclude.html

If you don't mind if these indexing engines index the
pages on your server (normally the case) then there is
no reason to have a robots.txt file.  You'll still
notice periodic access attempts in the logs, as lycos
and other search robots will check for the file before 
reindexing your site.  The only time you might want
to have a robots.txt file is if you have thousands of
documents and are afraid that having a spider walk
through them all would bog down your server, or if 
you want to prevent well-behaved robots from accessing
certain URLs that you don't want indexed.

More details are at:

http://web.nexor.co.uk/mak/doc/robots/norobots.html

Carlos McEvilly
cim at lanl.gov