[Web4lib] google & library catalogs

Adam Brin abrin at brynmawr.edu
Tue Apr 11 17:04:36 EDT 2006


True ... but you're giving to much credit to the intelligence of the
spiders.  Our opac was open to search engines for at least 6 months,
and could be seen in our stats constantly until we cut them off.  In
looking at the OPAC search stats, they are so heavily weighted against a
very small set of resources that it seems pretty clear that the search
engines are not really doing a good job of spidering.

The reality is, that our OPAC's aren't designed for Spiders, there are
tremendously more valuable site structures that would enable them to be
spidered.  The relationships are there, but the URL and file structure
isn't.  Special site-maps might help.  I do like the idea of increasing
the relevance of "Trusted Provider" such as OWC though.  It would be
really nice to see all of their 'coverage data' be enabled as an alternate
relevancy model too, combined with things like citation data this could be
pretty powerful.

- adam

On Tue, 11 Apr 2006, Casey Bisson wrote:

>
>
> > having search engines crawl library catalogs
> > is technically problematic in most cases
>
> Technical problems, yes, but not insoluble problems.
>
> Every link into the catalog (and we need to support those for a
> number of reasons) becomes a new access point. From there a spider
> can find lists of other works by the same author(s) and other works
> within the same subject(s). depending on how deep the spider crawls,
> we may quickly find huge numbers of indexed resources from a small
> number of inbound links. And when that fails, search engines offer us
> other solutions, including Google's site maps and others.
>
> Now, understanding that links are the lucre of the Google Economy,
> let me pose this question: what happens when all of our distributed,
> indexable catalogs sport links from their records to the OpenWorldCat
> record for those items? How much more relevant does OWC then become?
> How much more findable do all our resources become?
>
> --Casey
>
>
> On Apr 11, 2006, at 4:33 PM, Adam Brin wrote:
>
> > A note on practicality:
> >
> > Whether intentional or not, having search engines crawl library
> > catalogs
> > is technically problematic in most cases.  From experience, we've
> > had the
> > big three crawl our catalog and to be quite honest, they get tied
> > up in
> > Knots.  To be more specific, search engines (a) have a hard time
> > crawling
> > catalogs because they're webs of highly interconnected pages [one
> > might
> > argue even more maze like than other sites] and (b) most don't have
> > that
> > many entry points in.  A search engine doesn't use a 'search box'
> > on your
> > site, and must be led into the catalog via a set of links to a
> > record, or
> > record set.
> >
> > Peronally:
> > I really agree with Roy, about working through groups such as OCLC
> > and the
> > OpenWorldCat implementation to get the library 'face time'.  In
> > reality
> > the more catalogs that open to google, the harder it would be for our
> > catalog records to be found.  WorldCat has more cache in what casey
> > calls
> > the google economy than any one of us could separately, and it's
> > working
> > on our behalf.
> >
> > - adam brin
> > -------------------------------------
> > Tri-Colleges Systems Coordinator
> > Bryn Mawr | Haverford | Swarthmore
> > http://tripod.brynmawr.edu
> >
> >  On Tue, 11 Apr 2006, Casey Bisson wrote:
> >
> >>
> >>
> >> David,
> >>
> >> your suggestion that we build library systems that can be easily
> >> integrated within other systems such as learning management systems
> >> is well put.
> >>
> >> The time has passed where library activities were restricted to those
> >> occurring within the library, and we now have to think about how our
> >> resources will be used in a variety of electronic environments. The
> >> cornerstones of the Google economy -- indexability and linkability --
> >> do well to serve our needs not only in LMSs and academic portals, but
> >> also in our email or IM communications and in environments even
> >> further afield, such as blogs or Facebook.
> >>
> >> Thank you,
> >>
> >> --Casey
> >>
> >>
> >>
> >>
> >> On Apr 11, 2006, at 3:33 PM, David Walker wrote:
> >>
> >>> Sara,
> >>>
> >>> Putting your digital collections aside for a second, you might
> >>> want to
> >>> consider whether Google really is the best mechanism for exposing
> >>> your
> >>> own users at the University of Oregon to your collections.
> >>>
> >>> Google is, of course, popular and sexy; and no doubt all of your
> >>> users
> >>> start their research there or in another search engine.
> >>>
> >>> But throwing your catalog records into the Great Big Index Of
> >>> Stuff is
> >>> kind of like your local mom-and-pop supermarket using national
> >>> television networks to advertise a sale on oranges.  You won't
> >>> get as
> >>> big a reach advertising in the local newspaper, but focusing your
> >>> advertising on people who will actually find it useful and
> >>> meaningful
> >>> can be far more effective.
> >>>
> >>> Given limited budgets and resources, I would personally opt to
> >>> invest
> >>> resources into integrating your collections into whatever learning
> >>> management system(s) and/or portal you all have there at the
> >>> University
> >>> of Oregon.  Those systems are *heavily* used by your core audience,
> >>> and
> >>> the current level of integration between library systems and
> >>> learning
> >>> management systems could be greatly improved.
> >>>
> >>> You may not get as many visits as from a high placement in a Google
> >>> result set (although most of your records probably won't appear high
> >>> enough to be effective anyway), but visits mean nothing unless they
> >>> actually result in check-outs.
> >>>
> >>> --Dave
> >>>
> >>> =========================
> >>> David Walker
> >>> Web Development Librarian
> >>> Library, Cal State San Marcos
> >>> 760-750-4379
> >>> http://public.csusm.edu/dwalker
> >>> =========================
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: web4lib-bounces at webjunction.org
> >>> [mailto:web4lib-bounces at webjunction.org] On Behalf Of Casey Bisson
> >>> Sent: Tuesday, April 11, 2006 10:45 AM
> >>> To: web4lib at webjunction.org
> >>> Cc: Sara Brownmiller
> >>> Subject: Re: [Web4lib] google & library catalogs
> >>>
> >>> Sara,
> >>>
> >>> With more than 80 million Americans searching the web on any given
> >>> day, and major search engines handling five billion searches per
> >>> month, it's hard to imagine not wanting to make library resources
> >>> findable and available to those users.
> >>>
> >>> Google scares and confuses most of us, but I like to describe it
> >>> as a
> >>> giant OPAC with cataloging rules much like those we're already
> >>> familiar with (even if those rules are different from what we're
> >>> familiar with). Unfortunately, many of our systems are built in ways
> >>> that contradict those rules and make our content difficult to index
> >>> and find.
> >>>
> >>> But it's a challenge we can meet. And considering that a good number
> >>> of those billions of monthly searches could benefit from the
> >>> knowledge available within libraries, it's a challenge that's worth
> >>> our effort.
> >>>
> >>> That's the philosophy, here's some practice:
> >>>
> >>> WPopac[1] is my project to improve the findability of our resources
> >>> by following the rules of the Google Economy[2]. In doing so it's
> >>> already highly ranked for at least one search[3], and the logs show
> >>> that it's getting a large number of hits from search engines for
> >>> terms like "di vinci code" (yes, note the misspelling)  and
> >>> "assisted
> >>> suicide" along with a few hundred more. How many hits? In the less
> >>> than three months that the prototype has been open to the public,
> >>> it's received more than 550,000 page loads (that count excludes my
> >>> own activity), about as many as official Plymouth State University
> >>> catalog received in 12 months last year.
> >>>
> >>> 1: http://maisonbisson.com/blog/post/11133/
> >>>
> >>> 2: http://en.wikipedia.org/wiki/Google_economy
> >>>
> >>> 3: http://www.google.com/search?q=joe+monninger
> >>>
> >>> Casey Bisson
> >>> __________________________________________
> >>>
> >>> e-Learning Application Developer
> >>> Plymouth State University
> >>> Plymouth, New Hampshire
> >>> http://oz.plymouth.edu/~cbisson/
> >>> ph: 603-535-2256
> >>>
> >>>
> >>> On Apr 10, 2006, at 5:55 PM, Sara Brownmiller wrote:
> >>>
> >>>>
> >>>> There is interest here in allowing google (google the search
> >>>> engine, not
> >>>> google scholar) to spider, or crawl, our library catalog.  Since
> >>>> many
> >>>> students start their research in google, they might identify
> >>>> information
> >>>> easily available to them.  It would also help increase exposure to
> >>>> materials in our digital collections and our special collections
> >>>> and
> >>>> manuscripts.
> >>>>
> >>>> Has anyone allowed a search engine to crawl their catalog?  What
> >>>> impact
> >>>> did it have on the performance?  Does your library have a policy
> >>>> about
> >>>> search engines crawling your catalog?  What factors influenced your
> >>>> decision?
> >>>>
> >>>> I would also be very interested in locating some records in google
> >>>> that
> >>>> came from a library catalog to see how the user is linked to the
> >>>> catalog
> >>>> or to see how the material is identified with a specific
> >>>> institution.
> >>>>
> >>>> thanks, Sara
> >>>>
> >>>> Sara Brownmiller			University of Oregon Libraries
> >>>> Director, Library Systems 		1299 University of Oregon
> >>>> Women's Studies Librarian		Eugene, OR  97403-1299
> >>>> 					541/346-2368 (voice)
> >>>> snb at uoregon.edu				541/346-3485 (fax)
> >>>> _______________________________________________
> >>>> Web4lib mailing list
> >>>> Web4lib at webjunction.org
> >>>> http://lists.webjunction.org/web4lib/
> >>>
> >>> _______________________________________________
> >>> Web4lib mailing list
> >>> Web4lib at webjunction.org
> >>> http://lists.webjunction.org/web4lib/
> >>>
> >>
> >> _______________________________________________
> >> Web4lib mailing list
> >> Web4lib at webjunction.org
> >> http://lists.webjunction.org/web4lib/
> >>
>



More information about the Web4lib mailing list