[Web4lib] Re: Google Search Appliance and OPACs
Martin Vojnar
vojnar at vkol.cz
Thu Feb 7 05:23:13 EST 2008
Dears,
we did something like this. We dumped our catalog
(http://aleph.vkol.cz, cca 1 mil. of records)
into static html pages, so crawlers could come
and take them. Every static page has a link to
the live record in the catalog.
Firstly we built a tree structure between all
records, so robots would start at home page
(http://aleph.vkol.cz/pub) and find the rest of
records, this proved ok, but took Google cca 2
months to get all the records.
So we switched to sitemap solution
(http://aleph.vkol.cz/sitemap.xml) and Google
crawled/indexed everything in 2 weeks.
Some stats say we got cca 2000 new visitors every
day with 80% bounce rate. Obviously there are
many follow-up questions (the world is not our
target, so why to publish the catalog in Google
instaed of local search engines etc.), but this
was more or less just experiment.
Other crawlers (Yahoo, MSN) do not match Google
performance and do not work with sitemap files
efficiently.
BR, Martin
On 6 Feb 2008 at 21:27, Tim Spalding wrote:
> Has anyone tried just making a HUGE page of links and putting it
> somewhere Google will find it? Almost all OPACs allow direct links to
> records, by ISBN or something else. On a *few*-I've seen it on
> HiP-spidering this way causes serious sessions issues. (LibraryThing
> made this mistake once.) But it might be a way to get data into
> Google.
>
> Tim
--
Ing. Martin Vojnar, Research Library Olomouc,
Czech Republic
phone://+420 585 205 352
http://www.vkol.cz/
More information about the Web4lib
mailing list