Robots

tdowling at lib.washington.edu tdowling at lib.washington.edu
Tue Sep 26 11:54:12 EDT 1995


http://info.webcrawler.com/mak/projects/robots/norobots.html

Note that the file you need is robots.txt, not robots.html.

Thomas Dowling
Networked Information Librarian, Public Services
University of Washington Libraries
tdowling at u.washington.edu


Note from:  Mia Massicotte <MIAMASS at VAX2.CONCORDIA.CA>
Thu, 21 Sep 95 17:07:39 PDT----------------------------------------
 % Walter W. Giesbrecht  (walterg at yorku.ca) asked:
 % 
 % >On another note: my access log notes about a dozen attempts to
 % >retrieve a file called robots.txt from the root directory of the
 % >server. Such a file has never existed here, and my colleagues who
 % >manage other servers on campus have noticed the same thing. Several
 % >of them have come from query2.lycos.cs.cmu.edu, which might be a
 % >Lycos search & index attempt; others are not obvious. Any ideas?
 % 
 % If I recall, robots.html is a file you include on your server if you do not
 % want your server to be hit by a robot.  The robot looks for such a file; if 
it
 % exists, the robot disregards your server for harvesting.  There is an
 % explanation of how robots work on the net somewhere, and this is explained. 
 % Sorry, I don't have the URL on hand; perhaps someone else does?
 % 
 % Mia Massicotte, Systems Librarian
 % Concordia University Library, Montreal, Quebec CANADA
 % miamass at vax2.concordia.ca



More information about the Web4lib mailing list