[Web4lib] Re: Google Search Appliance and OPACs

Thu Feb 7 05:23:13 EST 2008

Dears,

we did something like this. We dumped our catalog 
(http://aleph.vkol.cz, cca 1 mil. of records) 
into static html pages, so crawlers could come 
and take them. Every static page has a link to 
the live record in the catalog.

Firstly we built a tree structure between all 
records, so robots would start at home page 
(http://aleph.vkol.cz/pub) and find the rest of 
records, this proved ok, but took Google cca 2 
months to get all the records.

So we switched to sitemap solution 
(http://aleph.vkol.cz/sitemap.xml) and Google 
crawled/indexed everything in 2 weeks.

Some stats say we got cca 2000 new visitors every 
day with 80% bounce rate. Obviously there are 
many follow-up questions (the world is not our 
target, so why to publish the catalog in Google 
instaed of local search engines etc.), but this 
was more or less just experiment.

Other crawlers (Yahoo, MSN) do not match Google 
performance and do not work with sitemap files 
efficiently.

BR, Martin

On 6 Feb 2008 at 21:27, Tim Spalding wrote:

> Has anyone tried just making a HUGE page of links and putting it
> somewhere Google will find it? Almost all OPACs allow direct links to
> records, by ISBN or something else. On a *few*-I've seen it on
> HiP-spidering this way causes serious sessions issues. (LibraryThing
> made this mistake once.) But it might be a way to get data into
> Google.
> 
> Tim

-- 
Ing. Martin Vojnar, Research Library Olomouc, 
Czech Republic
phone://+420 585 205 352
http://www.vkol.cz/