Hiding draft pages from browsers, search engines

Sun Mar 22 12:23:42 EST 1998

-----Original Message-----
From: Web Publishers Virtual Library <arnett at alink.net>
To: Multiple recipients of list <web4lib at library.berkeley.edu>
Date: Sunday, March 22, 1998 11:26 AM
Subject: Re: Hiding draft pages from browsers, search engines

>At 06:53 AM 3/22/98 -0800, morganj at iupui.edu wrote:
>>However, if there is an index or home file is there any way to
>>force a browser to bypass it and list the files in the directory?
>
>Generally not.  Since this is a security issue, I'd hesitate to say it is
>flat-out impossible, but via the Web, it probably is.  If you're also
>running ftp or gopher, this is not true for them.

When an HTTP GET requests points to a directory, it is the *server*, not the
client, that determines what gets sent back.  For any competent server, this
is either a file whose name matches a pre-configured list (index.html,
default.htm, index.cgi--whatever is set up on your server); OR a formatted
list of files comparable to a Unix "ls -l" command; OR an error messaging
explaining that directory browsing is prohibited.  Since it's the server's
choice what to send, this is pretty secure--assuming an appropriately
configured server.

However, this assumes that no one has written a link to your specific draft
document on a page that is robot-accessible.  If the page is on your server
so that a group of people can review it, someone might right a link to it on
their personal home page.  A better guarantee of keeping this document
private is to protect it with a password and/or IP restrictions.

>
>>Secondly, can these "hidden" files be indexed by search engines?
>
>If you mean search engines that are indexing via the Web, they won't find
>those files unless there's a link to them.  I am not aware of any robot
>that even tries to retrieve directories unless there is an explicit link to
>them.  This means that even if you don't have a default page for the
>directory, a search engine robot probably won't find the directory listing
>unless there is a link explicitly pointing to it.  However, a locally
>running robot that uses the file system, rather than the Web, typically
>would find them.
>
>>  However can search engines be set to ignore the index
>>and home files and index all files in a directory on a remote web server?
>
>There's really no standard for including files and directories in robot
>directives.  There is only a de facto standard for excluding them
>(robots.txt).  See
>http://info.webcrawler.com/mak/projects/robots/norobots.html

If you don't have write permissions in your server's document root (where
robots.txt lives), you can try including this in any HTML document's HEAD:

<META NAME="ROBOTS" CONTENT="NOINDEX">

Be aware that this is not as widely supported as robots.txt.

Thomas Dowling
Ohio Library and Information Network
tdowling at ohiolink.edu