protecting web servers from robots

Bill Crosbie crosbie at AESOP.RUTGERS.EDU
Fri Jun 14 10:38:13 EDT 1996


At 03:52 PM 6/13/96 -0700, chris at sparkie.osl.state.or.us wrote:
>Can anyone provide information about how to protect web servers from the
>robots that roam the internet, such as Altavista, Lycos, etc. I have several
>Unix servers and  a Mac server. I know that I have heard of a way to do
>this, but can't remember what it was.
>
>Christopher Adams
>Oregon State Library
>chris at sparkie.osl.state.or.us
>
>

Chris,

You need to place a file named robots.txt at the top level of your web
heirarchy.  Inside the robots.txt file, you can specify which robots to
disallow and which to allow.

If you want to keep out all intruders, use:

# go away
User-agent: *
Disallow: /


NOTE:  This is dependent upon the robot obeying the standards for robot
exclusion.  There is no rule apart from courtesy that dictates that this
will keep spiders off of your web.

For more information, hit:
<http://info.webcrawler.com/mak/projects/robots/norobots.html>
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you board the wrong train,		Bill Crosbie
it's no use running along the 		Microcomputer Analyst
corridor in the other direction.	Chang Science Library
   	 -Dietrich Bonhoeffer		Rutgers University
					New Brunswick, NJ USA
					crosbie at aesop.rutgers.edu
					908-932-0305 x114



More information about the Web4lib mailing list