[Web4lib] More fun with google

Eric Hellman eric at openly.com
Sat Jun 18 22:48:51 EDT 2005


At 2:55 PM -0700 6/18/05, Karen Coyle wrote:
>Second, it is ironic for a company that has made its fortune sucking 
>up the contents of other people's web sites that their own is almost 
>entirely covered by their "disallows." What is stated in the 
>contract appears to be Google's general practice:
>(Google's robots.txt)

Karen,

It's not irony.

Google is following the golden rule: do unto others as you would have 
them do unto you. Google politely marks with "DISALLOW" the sections 
of the website that are dynamic- these are sections that polite 
robots and spiders want to stay away from because they are infinite 
content spaces.

for example, the scholar robots.txt excludes
/scholar?
but not
/scholar/
so all the static content about scholar is available to robots.

-- 

Eric Hellman, President                            Openly Informatics, Inc.
eric at openly.com                                    2 Broad St., 2nd Floor
tel 1-973-509-7800 fax 1-734-468-6216              Bloomfield, NJ 07003
http://www.openly.com/1cate/      1 Click Access To Everything


More information about the Web4lib mailing list