[Web4lib] google & library catalogs

Binkley, Peter Peter.Binkley at ualberta.ca
Wed Apr 12 13:34:09 EDT 2006


I'd never noticed before, but Google lists 377,000 pages from our opac:

http://www.google.com/search?hl=en&hs=pgN&lr=&client=firefox-a&rls=org.m
ozilla:en-US:official&q=site:ualweb.library.ualberta.ca

The ones I sampled were all searches by call number or issn, suggesting
that Google followed external links from our new books or ejournals
pages, but not internal links from the opac records themselves. Their
spidering algorithm must be smart enough to avoid getting lost in the
thickets of a highly-interlinked site with relatively few in-links.

The spider probably did attempt to follow external links in the opac,
since it has indexed 358,000 links to our EZProxy server (though a lot
of these, maybe all, could have come from elsewhere on our site). We'll
be replacing our ejournals' 856's with OpenURLs soon; it will be
interesting to see if Google starts trying to spider our resolver.
Currently our resolver has no hits in Google, but I'd have to check
whether there's a robots.txt that is keeping Google out.

Interestingly, the "Similar pages" link never comes up with anything. I
would have thought the metadata would provide a distinctive enough
fingerprint to pull up catalogue records for the same book at other
libraries; but perhaps all the institution-specific language on the page
muddies the waters too much.


Peter

Peter Binkley
Digital Initiatives Technology Librarian
Information Technology Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: peter.binkley at ualberta.ca



-----Original Message-----
From: web4lib-bounces at webjunction.org
[mailto:web4lib-bounces at webjunction.org] On Behalf Of Joleen Crockett
Sent: Wednesday, April 12, 2006 12:01 AM
To: web4lib at webjunction.org
Subject: RE: [Web4lib] google & library catalogs

Not Google, but apparently Yahoo. Entering the catalog address as a
search (minus http://) brings up specific records from III
catalogs--especially those with a Kids catalog. Titles don't seem to
limited to Juvenile.  Records often appear in search results on the
second or third page.  



Joleen


Joleen Crockett
Adult Services Librarian
Tempe Public Library
Tempe,AZ

-----Original Message-----
From: web4lib-bounces at webjunction.org
[mailto:web4lib-bounces at webjunction.org]
On Behalf Of Sara Brownmiller
Sent: Monday, April 10, 2006 2:56 PM
To: web4lib at webjunction.org
Subject: [Web4lib] google & library catalogs



There is interest here in allowing google (google the search engine, not
google
scholar) to spider, or crawl, our library catalog.  Since many students
start their research in google, they might identify information easily
available to them.  It would also help increase exposure to materials in
our digital collections and our special collections and manuscripts.

Has anyone allowed a search engine to crawl their catalog?  What impact
did it have on the performance?  Does your library have a policy about
search engines crawling your catalog?  What factors influenced your
decision?

I would also be very interested in locating some records in google that
came from a library catalog to see how the user is linked to the catalog
or to see how the material is identified with a specific institution.

thanks, Sara

Sara Brownmiller			University of Oregon Libraries
Director, Library Systems 		1299 University of Oregon
Women's Studies Librarian		Eugene, OR  97403-1299
					541/346-2368 (voice)
snb at uoregon.edu				541/346-3485 (fax)
_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/

_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/


More information about the Web4lib mailing list