protecting web servers from robots
Bill Crosbie
crosbie at AESOP.RUTGERS.EDU
Fri Jun 14 10:38:13 EDT 1996
At 03:52 PM 6/13/96 -0700, chris at sparkie.osl.state.or.us wrote:
>Can anyone provide information about how to protect web servers from the
>robots that roam the internet, such as Altavista, Lycos, etc. I have several
>Unix servers and a Mac server. I know that I have heard of a way to do
>this, but can't remember what it was.
>
>Christopher Adams
>Oregon State Library
>chris at sparkie.osl.state.or.us
>
>
Chris,
You need to place a file named robots.txt at the top level of your web
heirarchy. Inside the robots.txt file, you can specify which robots to
disallow and which to allow.
If you want to keep out all intruders, use:
# go away
User-agent: *
Disallow: /
NOTE: This is dependent upon the robot obeying the standards for robot
exclusion. There is no rule apart from courtesy that dictates that this
will keep spiders off of your web.
For more information, hit:
<http://info.webcrawler.com/mak/projects/robots/norobots.html>
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you board the wrong train, Bill Crosbie
it's no use running along the Microcomputer Analyst
corridor in the other direction. Chang Science Library
-Dietrich Bonhoeffer Rutgers University
New Brunswick, NJ USA
crosbie at aesop.rutgers.edu
908-932-0305 x114
More information about the Web4lib
mailing list