[Web4lib] google & library catalogs

Tue Apr 11 17:11:49 EDT 2006

> The reality is that our OPAC's aren't designed for Spiders

Exactly: our systems fail us in the Google Economy.

We can fix that.

On Apr 11, 2006, at 5:04 PM, Adam Brin wrote:

> True ... but you're giving to much credit to the intelligence of the
> spiders.  Our opac was open to search engines for at least 6 months,
> and could be seen in our stats constantly until we cut them off.  In
> looking at the OPAC search stats, they are so heavily weighted  
> against a
> very small set of resources that it seems pretty clear that the search
> engines are not really doing a good job of spidering.
>
> The reality is, that our OPAC's aren't designed for Spiders, there are
> tremendously more valuable site structures that would enable them  
> to be
> spidered.  The relationships are there, but the URL and file structure
> isn't.  Special site-maps might help.  I do like the idea of  
> increasing
> the relevance of "Trusted Provider" such as OWC though.  It would be
> really nice to see all of their 'coverage data' be enabled as an  
> alternate
> relevancy model too, combined with things like citation data this  
> could be
> pretty powerful.
>
> - adam
>
> On Tue, 11 Apr 2006, Casey Bisson wrote:
>
>>
>>
>>> having search engines crawl library catalogs
>>> is technically problematic in most cases
>>
>> Technical problems, yes, but not insoluble problems.
>>
>> Every link into the catalog (and we need to support those for a
>> number of reasons) becomes a new access point. From there a spider
>> can find lists of other works by the same author(s) and other works
>> within the same subject(s). depending on how deep the spider crawls,
>> we may quickly find huge numbers of indexed resources from a small
>> number of inbound links. And when that fails, search engines offer us
>> other solutions, including Google's site maps and others.
>>
>> Now, understanding that links are the lucre of the Google Economy,
>> let me pose this question: what happens when all of our distributed,
>> indexable catalogs sport links from their records to the OpenWorldCat
>> record for those items? How much more relevant does OWC then become?
>> How much more findable do all our resources become?
>>
>> --Casey
>>
>>
>> On Apr 11, 2006, at 4:33 PM, Adam Brin wrote:
>>
>>> A note on practicality:
>>>
>>> Whether intentional or not, having search engines crawl library
>>> catalogs
>>> is technically problematic in most cases.  From experience, we've
>>> had the
>>> big three crawl our catalog and to be quite honest, they get tied
>>> up in
>>> Knots.  To be more specific, search engines (a) have a hard time
>>> crawling
>>> catalogs because they're webs of highly interconnected pages [one
>>> might
>>> argue even more maze like than other sites] and (b) most don't have
>>> that
>>> many entry points in.  A search engine doesn't use a 'search box'
>>> on your
>>> site, and must be led into the catalog via a set of links to a
>>> record, or
>>> record set.
>>>
>>> Peronally:
>>> I really agree with Roy, about working through groups such as OCLC
>>> and the
>>> OpenWorldCat implementation to get the library 'face time'.  In
>>> reality
>>> the more catalogs that open to google, the harder it would be for  
>>> our
>>> catalog records to be found.  WorldCat has more cache in what casey
>>> calls
>>> the google economy than any one of us could separately, and it's
>>> working
>>> on our behalf.
>>>
>>> - adam brin
>>> -------------------------------------
>>> Tri-Colleges Systems Coordinator
>>> Bryn Mawr | Haverford | Swarthmore
>>> http://tripod.brynmawr.edu
>>>
>>>  On Tue, 11 Apr 2006, Casey Bisson wrote:
>>>
>>>>
>>>>
>>>> David,
>>>>
>>>> your suggestion that we build library systems that can be easily
>>>> integrated within other systems such as learning management systems
>>>> is well put.
>>>>
>>>> The time has passed where library activities were restricted to  
>>>> those
>>>> occurring within the library, and we now have to think about how  
>>>> our
>>>> resources will be used in a variety of electronic environments. The
>>>> cornerstones of the Google economy -- indexability and  
>>>> linkability --
>>>> do well to serve our needs not only in LMSs and academic  
>>>> portals, but
>>>> also in our email or IM communications and in environments even
>>>> further afield, such as blogs or Facebook.
>>>>
>>>> Thank you,
>>>>
>>>> --Casey
>>>>
>>>>
>>>>
>>>>
>>>> On Apr 11, 2006, at 3:33 PM, David Walker wrote:
>>>>
>>>>> Sara,
>>>>>
>>>>> Putting your digital collections aside for a second, you might
>>>>> want to
>>>>> consider whether Google really is the best mechanism for exposing
>>>>> your
>>>>> own users at the University of Oregon to your collections.
>>>>>
>>>>> Google is, of course, popular and sexy; and no doubt all of your
>>>>> users
>>>>> start their research there or in another search engine.
>>>>>
>>>>> But throwing your catalog records into the Great Big Index Of
>>>>> Stuff is
>>>>> kind of like your local mom-and-pop supermarket using national
>>>>> television networks to advertise a sale on oranges.  You won't
>>>>> get as
>>>>> big a reach advertising in the local newspaper, but focusing your
>>>>> advertising on people who will actually find it useful and
>>>>> meaningful
>>>>> can be far more effective.
>>>>>
>>>>> Given limited budgets and resources, I would personally opt to
>>>>> invest
>>>>> resources into integrating your collections into whatever learning
>>>>> management system(s) and/or portal you all have there at the
>>>>> University
>>>>> of Oregon.  Those systems are *heavily* used by your core  
>>>>> audience,
>>>>> and
>>>>> the current level of integration between library systems and
>>>>> learning
>>>>> management systems could be greatly improved.
>>>>>
>>>>> You may not get as many visits as from a high placement in a  
>>>>> Google
>>>>> result set (although most of your records probably won't appear  
>>>>> high
>>>>> enough to be effective anyway), but visits mean nothing unless  
>>>>> they
>>>>> actually result in check-outs.
>>>>>
>>>>> --Dave
>>>>>
>>>>> =========================
>>>>> David Walker
>>>>> Web Development Librarian
>>>>> Library, Cal State San Marcos
>>>>> 760-750-4379
>>>>> http://public.csusm.edu/dwalker
>>>>> =========================
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: web4lib-bounces at webjunction.org
>>>>> [mailto:web4lib-bounces at webjunction.org] On Behalf Of Casey Bisson
>>>>> Sent: Tuesday, April 11, 2006 10:45 AM
>>>>> To: web4lib at webjunction.org
>>>>> Cc: Sara Brownmiller
>>>>> Subject: Re: [Web4lib] google & library catalogs
>>>>>
>>>>> Sara,
>>>>>
>>>>> With more than 80 million Americans searching the web on any given
>>>>> day, and major search engines handling five billion searches per
>>>>> month, it's hard to imagine not wanting to make library resources
>>>>> findable and available to those users.
>>>>>
>>>>> Google scares and confuses most of us, but I like to describe it
>>>>> as a
>>>>> giant OPAC with cataloging rules much like those we're already
>>>>> familiar with (even if those rules are different from what we're
>>>>> familiar with). Unfortunately, many of our systems are built in  
>>>>> ways
>>>>> that contradict those rules and make our content difficult to  
>>>>> index
>>>>> and find.
>>>>>
>>>>> But it's a challenge we can meet. And considering that a good  
>>>>> number
>>>>> of those billions of monthly searches could benefit from the
>>>>> knowledge available within libraries, it's a challenge that's  
>>>>> worth
>>>>> our effort.
>>>>>
>>>>> That's the philosophy, here's some practice:
>>>>>
>>>>> WPopac[1] is my project to improve the findability of our  
>>>>> resources
>>>>> by following the rules of the Google Economy[2]. In doing so it's
>>>>> already highly ranked for at least one search[3], and the logs  
>>>>> show
>>>>> that it's getting a large number of hits from search engines for
>>>>> terms like "di vinci code" (yes, note the misspelling)  and
>>>>> "assisted
>>>>> suicide" along with a few hundred more. How many hits? In the less
>>>>> than three months that the prototype has been open to the public,
>>>>> it's received more than 550,000 page loads (that count excludes my
>>>>> own activity), about as many as official Plymouth State University
>>>>> catalog received in 12 months last year.
>>>>>
>>>>> 1: http://maisonbisson.com/blog/post/11133/
>>>>>
>>>>> 2: http://en.wikipedia.org/wiki/Google_economy
>>>>>
>>>>> 3: http://www.google.com/search?q=joe+monninger
>>>>>
>>>>> Casey Bisson
>>>>> __________________________________________
>>>>>
>>>>> e-Learning Application Developer
>>>>> Plymouth State University
>>>>> Plymouth, New Hampshire
>>>>> http://oz.plymouth.edu/~cbisson/
>>>>> ph: 603-535-2256
>>>>>
>>>>>
>>>>> On Apr 10, 2006, at 5:55 PM, Sara Brownmiller wrote:
>>>>>
>>>>>>
>>>>>> There is interest here in allowing google (google the search
>>>>>> engine, not
>>>>>> google scholar) to spider, or crawl, our library catalog.  Since
>>>>>> many
>>>>>> students start their research in google, they might identify
>>>>>> information
>>>>>> easily available to them.  It would also help increase  
>>>>>> exposure to
>>>>>> materials in our digital collections and our special collections
>>>>>> and
>>>>>> manuscripts.
>>>>>>
>>>>>> Has anyone allowed a search engine to crawl their catalog?  What
>>>>>> impact
>>>>>> did it have on the performance?  Does your library have a policy
>>>>>> about
>>>>>> search engines crawling your catalog?  What factors influenced  
>>>>>> your
>>>>>> decision?
>>>>>>
>>>>>> I would also be very interested in locating some records in  
>>>>>> google
>>>>>> that
>>>>>> came from a library catalog to see how the user is linked to the
>>>>>> catalog
>>>>>> or to see how the material is identified with a specific
>>>>>> institution.
>>>>>>
>>>>>> thanks, Sara
>>>>>>
>>>>>> Sara Brownmiller			University of Oregon Libraries
>>>>>> Director, Library Systems 		1299 University of Oregon
>>>>>> Women's Studies Librarian		Eugene, OR  97403-1299
>>>>>> 					541/346-2368 (voice)
>>>>>> snb at uoregon.edu				541/346-3485 (fax)
>>>>>> _______________________________________________
>>>>>> Web4lib mailing list
>>>>>> Web4lib at webjunction.org
>>>>>> http://lists.webjunction.org/web4lib/
>>>>>
>>>>> _______________________________________________
>>>>> Web4lib mailing list
>>>>> Web4lib at webjunction.org
>>>>> http://lists.webjunction.org/web4lib/
>>>>>
>>>>
>>>> _______________________________________________
>>>> Web4lib mailing list
>>>> Web4lib at webjunction.org
>>>> http://lists.webjunction.org/web4lib/
>>>>
>>