[Web4lib] new scholarly content in (vanilla) Google?

Casey Bisson cbisson at plymouth.edu
Mon Feb 25 12:23:51 EST 2008


On Feb 25, 2008, at 11:23 AM, Rudy Leon wrote:

> If I recall correctly, Google spiders were not allowed to crawl sites
> they did not have full access to, and thus Google Scholar was born to
> have different rules and allow the spidering of journal home pages and
> other password blocked sites. However, this morning's results were in
> plain old vanilla google, and also failed to include the "Full text @"
> links we see from Schoogle results.


JSTOR and Muse, as well as a number of other sites, both specifically  
allow the Google crawlers to index their site. You can see part of  
their instructions to Google and other crawlers in the robots.txt:

http://www.jstor.org/robots.txt

They might also be checking if the request comes from a crawler and  
changing the content they return based on that. A good example is  
NPR.org, which lets Google index their full text transcripts, but  
requires users to pay for them.

But your example is a good one: when information is opened up to  
Google, it's usually findable there.


Casey Bisson
__________________________________________

Information Architect
Plymouth State University
Plymouth, New Hampshire
http://MaisonBisson.com/
ph: 603-535-2256



More information about the Web4lib mailing list