Hiding draft pages from browsers, search engines

Web Publishers Virtual Library arnett at alink.net
Sun Mar 22 11:11:18 EST 1998


At 06:53 AM 3/22/98 -0800, morganj at iupui.edu wrote:
>However, if there is an index or home file is there any way to
>force a browser to bypass it and list the files in the directory?

Generally not.  Since this is a security issue, I'd hesitate to say it is
flat-out impossible, but via the Web, it probably is.  If you're also
running ftp or gopher, this is not true for them.

>Secondly, can these "hidden" files be indexed by search engines?

If you mean search engines that are indexing via the Web, they won't find
those files unless there's a link to them.  I am not aware of any robot
that even tries to retrieve directories unless there is an explicit link to
them.  This means that even if you don't have a default page for the
directory, a search engine robot probably won't find the directory listing
unless there is a link explicitly pointing to it.  However, a locally
running robot that uses the file system, rather than the Web, typically
would find them.

>  However can search engines be set to ignore the index
>and home files and index all files in a directory on a remote web server?

There's really no standard for including files and directories in robot
directives.  There is only a de facto standard for excluding them
(robots.txt).  See
http://info.webcrawler.com/mak/projects/robots/norobots.html

>This came to mind when I contemplated having a local email directory, and
>began to think about how to make it available to individuals within the
>library but not to email spammers search engines.

You could use robots.txt to exclude the entire directory from
*well-behaved* robots.  However, it's simple to create a robot that ignores
robots.txt, so this might not work.  Many Web servers allow you to limit
access to a directory by IP address.  That's probably the thing to do --
only allow your own institution's addresses to access the email directory.

Nick


More information about the Web4lib mailing list