[Web4lib] Spidering speed
Tim Spalding
tim at librarything.com
Sat Mar 31 11:27:16 EDT 2007
Right. You control the retrieval speed, or turn it off. I think the
speeds should be 1 second, 2, seconds, 5 seconds. If your OPAC can't
handle requests every 5 seconds, you are in trouble anyway.
The widgets themselves don't hit the OPAC—only LibraryThing, via JS.
But we need to load up some data (eg., titles and authors) before the
widgets can work. Asking for dumps in a specific format, or even big
MARC dumps is too much of a pain for people.
I thought Z39.50s were generally running on separate boxes. Hmmm.
On 3/31/07, Ryan Eby <ryaneby at gmail.com> wrote:
> You'll probably find them mixed. I would hope that they could handle
> the slightly extra load in case patrons actually decide to use it.
> However I guess it depends on whether your threading multiple requests
> to the same opac. I've done various experiments with ours and never
> noticed a decrease in performance. There is also z39 and xml running
> on the same server which doesn't seem to hinder the performance. Ours
> could probably take more then a single hit but I know there are
> other's that would probably crash and burn under any load.
>
> The library can always opt-out of the widgets at a later time if they
> notice adverse affects, right?
>
> Ryan Eby
>
> On 3/31/07, Tim Spalding <tim at librarything.com> wrote:
> > Related to the sitemap issue...
> >
> > Does anyone know what sort of tolerances OPACs have for spidering? Can
> > most handle the industry-standard 1 hit/second? (It may be industry
> > standard, but insofar as most OPACs are not spiderable with
> > conventional means, they may not be tested for it.) Can they handle
> > faster?
> >
> > We're going to be introducing a bunch of library widgets soon,
> > providing LibraryThing functionality to conventional OPACs. Some of
> > the widgets require some data to be fetched off the OPAC, basically
> > through screen-scraping. We need to know how hard we can hit them.
> >
> > Tim
>
More information about the Web4lib
mailing list