[Web4lib] Which databases can Google Scholar crawl?

Kathryn Silberger Kathryn.Silberger at marist.edu
Wed Feb 20 13:40:54 EST 2008


On the later entries I made this morning, I noted how many items GS found
with the domain and put the date.  Granted, at this point we are using a
bomb when we really need a flyswatter, but we are at the beginning.  I
included the item count and date thinking that periodic re-checks might
reveal something. If the item count is measured 10s rather than 10,000s ,
you do get a rough sense of the completeness.

I think the oldest date included may be easier to determine than the most
recent date.

By the way, Roy, do you know if Google have any agreements with the
California system about crawling their publications?
Katy

Kathryn K. Silberger
Automation Resources Librarian
James A. Cannavino Library
Marist College
3399 North Road
Poughkeepsie, NY  12601
Kathryn.Silberger at marist.edu
(845) 575-3000 x.2419


                                                                           
             Will Kurt                                                     
             <wkurt at bbn.com>                                               
             Sent by:                                                   To 
             web4lib-bounces at w         Roy Tennant <tennantr at oclc.org>,    
             ebjunction.org            <web4lib at webjunction.org>           
                                                                        cc 
                                                                           
             02/20/2008 11:13                                      Subject 
             AM                        Re: [Web4lib] Which databases can   
                                       Google Scholar crawl?               
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




I think you bring up some good points Roy.

The methodology proposed on the wiki is actually to search for the
result from a particular domain, for example searching
inurl:ieeexplore.ieee.org/ yields 900k results, so that leaves me
fairly confident that the IEEE is pretty thoroughly indexed.  And
there is actually more value in knowing what is not indexed, being
able to tell users 'you definitely won't find information from X in
GS' would be very useful.

The great thing about a wiki is that it's flexible enough to change
dynamically to meet new requirements as they come up.

This is certainly far from the 'perfect' solution, but I think that
there is a real need in libraries for people to understand the value
of iterative problem solving.

Rather than meeting for months to formulate the 'perfect' solution
which will never actually be implemented, solve what can be easily
solved now, and then go back and look at how you can improve that
based on your remaining needs. Gaining some immediate value in each step.

--Will

At 10:26 AM 2/20/2008, Roy Tennant wrote:
>Doing this kind of thing is all well and good, but in the end it will not
be
>very reliable unless coverage can be verified beyond "I found an article
>that appears to come from X". Also, when items appear in Google Scholar
can
>be an issue. If they are available in other indexes significantly sooner
>than in GS, that would be worth knowing as well.
>Roy
>
>
>On 2/20/08 6:40 AM, "Kathryn Silberger" <Kathryn.Silberger at marist.edu>
>wrote:
>
> >
> > Will:
> >
> >       Very cool!  Thanks so much for doing that.  I made a humble
little
> > contribution this morning, and it seems like a fairly easy way of
beginning
> > the list.  Right now we are going after the low hanging fruit.  In a
few
> > weeks, creativity will be required to determine new entries.
> >
> >       Thanks for doing that.
> >
> > Katy
> >
> > Kathryn K. Silberger
> > Automation Resources Librarian
> > James A. Cannavino Library
> > Marist College
> > 3399 North Road
> > Poughkeepsie, NY  12601
> > Kathryn.Silberger at marist.edu
> > (845) 575-3000 x.2419
> >
> >
> >
> >              Will Kurt
> >              <wkurt at bbn.com>
> >
To
> >              02/19/2008 05:10          Kathryn Silberger
> >              PM                        <Kathryn.Silberger at marist.edu>,
> >                                        web4lib at webjunction.org
> >
cc
> >
> >
Subject
> >                                        Re: Fw: [Web4lib] Which
databases
> >                                        can Google Scholar crawl?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 03:11 PM 2/19/2008, Will Kurt wrote:
> >> anyone else?
> > ... or that someone could be me :)
> > I've taken a few minutes and quickly setup a phpWiki and put it on a
> > site that I run outside of work:
> >
> > http://lib-bling.com/scholar/index.php?GoogleScholar
> >
> > I put in a few entries mainly as a demo, but please add and see if we
> > can get a good list going.  Knowing what is not indexed is equally as
> > important as knowing what is. If enough people contribute this could
> > be a really useful.
> > --Will
> >
> >
> >
> > _______________________________________________
> > Web4lib mailing list
> > Web4lib at webjunction.org
> > http://lists.webjunction.org/web4lib/
>
>--
>
>
>_______________________________________________
>Web4lib mailing list
>Web4lib at webjunction.org
>http://lists.webjunction.org/web4lib/

_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/




More information about the Web4lib mailing list