[Web4lib] Which databases can Google Scholar crawl?

Roy Tennant tennantr at oclc.org
Thu Feb 21 11:48:39 EST 2008


Kathryn,
You've discovered why trying to limit to particular dates in Google Scholar
is an exercise in futility. You have to remember that GS is not searching
controlled metadata, but full-text. Do any search you wish, limiting to a
particular year. Then start clicking through to the actual articles. You
will discover that dates as displayed in GS search results are almost a
complete fiction. For example, I did a search, limiting to 2004 in the
advanced search. I clicked on an article that displayed 2005 in the search
results, and it turned out the article was actually published in 1990! In a
nutshell, Google engineers have disdained metadata for full-text searching
and this is what you get.
Roy


On 2/21/08 5:53 AM, "Kathryn Silberger" <Kathryn.Silberger at marist.edu>
wrote:

> 
> Roy has asked an interestng question about how completely and frequently
> Google crawled targets.  Here are a few factoids of interest:
> 
> Today a Jstor search returns about 1,810,000 items; limited to "since 2004"
> -2950 items; "since 2003, - 10,600 items; "since 2002"  - 18,700 items.  (I
> began with 2004 because of The Wall).  Also today, at
> http://www.jstor.org.online.library.marist.edu/about/facts.html , Jstor say
> it has 1,850,206 articles online.
> 
> I don't know how many duplicate entries there are in GS for Jstor, but I
> bet there are some.  Nonetheless, it looks like GS indexes something in the
> neighborhood of 90 - 95% of Jstor.
> 
> Today Blackwell Synergy has 912,000 items retrieved in a GS search. On
> their website the say they have "over a million articles online".  Limiting
> that GS search to "since 2008"  - 1950 articles are retrieved; "since 2007
> " - 27,600, and "since 2006" - 42,700.
> 
> When I do a "DM Silberger search", (my husband), the Google count  is 326.
> My husband has  only written about 3  or 4 dozen scholarly articles.  The
> rest of what is coming up, seems to be items citing his papers.  Does
> anyone know if  the "inurl:" search in GS is strictly limited to  the URL
> of  main article, or will it pull up any URL listed in the full text of the
> entry?
> 
> 
> Katy
> 
> Kathryn K. Silberger
> Automation Resources Librarian
> James A. Cannavino Library
> Marist College
> 3399 North Road
> Poughkeepsie, NY  12601
> Kathryn.Silberger at marist.edu
> (845) 575-3000 x.2419
> 
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/

-- 




More information about the Web4lib mailing list