[Web4lib] new scholarly content in (vanilla) Google?
Casey Bisson
cbisson at plymouth.edu
Mon Feb 25 12:23:51 EST 2008
On Feb 25, 2008, at 11:23 AM, Rudy Leon wrote:
> If I recall correctly, Google spiders were not allowed to crawl sites
> they did not have full access to, and thus Google Scholar was born to
> have different rules and allow the spidering of journal home pages and
> other password blocked sites. However, this morning's results were in
> plain old vanilla google, and also failed to include the "Full text @"
> links we see from Schoogle results.
JSTOR and Muse, as well as a number of other sites, both specifically
allow the Google crawlers to index their site. You can see part of
their instructions to Google and other crawlers in the robots.txt:
http://www.jstor.org/robots.txt
They might also be checking if the request comes from a crawler and
changing the content they return based on that. A good example is
NPR.org, which lets Google index their full text transcripts, but
requires users to pay for them.
But your example is a good one: when information is opened up to
Google, it's usually findable there.
Casey Bisson
__________________________________________
Information Architect
Plymouth State University
Plymouth, New Hampshire
http://MaisonBisson.com/
ph: 603-535-2256
More information about the Web4lib
mailing list