[Web4lib] More fun with google
Eric Hellman
eric at openly.com
Sat Jun 18 22:48:51 EDT 2005
At 2:55 PM -0700 6/18/05, Karen Coyle wrote:
>Second, it is ironic for a company that has made its fortune sucking
>up the contents of other people's web sites that their own is almost
>entirely covered by their "disallows." What is stated in the
>contract appears to be Google's general practice:
>(Google's robots.txt)
Karen,
It's not irony.
Google is following the golden rule: do unto others as you would have
them do unto you. Google politely marks with "DISALLOW" the sections
of the website that are dynamic- these are sections that polite
robots and spiders want to stay away from because they are infinite
content spaces.
for example, the scholar robots.txt excludes
/scholar?
but not
/scholar/
so all the static content about scholar is available to robots.
--
Eric Hellman, President Openly Informatics, Inc.
eric at openly.com 2 Broad St., 2nd Floor
tel 1-973-509-7800 fax 1-734-468-6216 Bloomfield, NJ 07003
http://www.openly.com/1cate/ 1 Click Access To Everything
More information about the Web4lib
mailing list