[WEB4LIB] no robots

Fri Jan 19 09:31:38 EST 2001

----- Original Message -----
From: "Stéphane Dudart" <dudart at bse.ucl.ac.be>
To: "Multiple recipients of list" <web4lib at webjunction.org>
Sent: Friday, January 19, 2001 9:13 AM
Subject: [WEB4LIB] no robots

> Could you tell me how to abort robots indexation on html files
centralized
> in a same folder?
>
> Thank you

The two main methods are documented at
<URL:http://info.webcrawler.com/mak/projects/robots/exclusion.html>.  The
first, and oldest, is the Standard for Robot Exclusion, which uses a
document root-level robots.txt file.

The second uses <META NAME="ROBOTS"...> tags in each affected document.

According to Search Engine Watch, all the crawlers they monitor observe
the first method, and all but Excite observe the second.  Of course, both
are routinely ignored by a bunch of other programs; in the long run, you
can't really prevent a misbehaving crawler from indexing any site it can
get to.

Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu