[WEB4LIB] Re: Z39.50 Discussion on Web4Lib

Sun Feb 11 16:20:39 EST 2001

At 15:22 09-02-01 -0800, Matthew Dovey wrote:

> > There are scalability issues with Z39.50.  Multithreaded searching of more
> > than 5-7 institutions at a time can result in bottlenecks due to the
> > client/server communications overhead.
>
>Sebastian Hammer from IndexData has successfully searched about 200 Z39.50
>targets simultaneously acheiving the same response time as searching 1
>target. In fact the response time is determined by the slowest Z39.50
>server, so when searching 200 you are statistically more likely to hit one
>which is slow (due to insufficient hardware etc.) or down which can make the
>search appear slower.

Actually, it was "only" 100 targets, but Matthew is correct that we 
observed no significant latency introduced by the concurrent searches, and 
frequently saw response times of 5-7 seconds to search and fetch a pageful 
of records (approximately equal to the average response time of the slowest 
server in the group). Note that these tests were carried out running a 
client on a well-connected network - this would most likely NOT work over a 
56K modem. Before we formally publicise our results, I expect we will have 
made the test with 200 targets as well.

>Of course it is possible to write bad multithreading code as well as good
>efficient multithreading code, and I suspect that some programmers may have
>made the claim you can't search more that 7 targets claim to cover poor
>performance in their clients....

I think mostly the problem has to do with a failure to analyse the sources 
of delays. Most often, this turns out to be sub-optimal target 
implementations that delay the whole process by taking inordinate amounts 
of time to do simple tasks. Often these problems are resolved once 
cross-searching clients are seriously deployed, and user needs are made 
visible.

There *are* scalability issues, but my sense is in practice they will have 
more to do with quality-of-service and reliability issues (ie. if you 
search 500 targets, do you *need* a response from each one).

Perhaps the most important issue, and one which is sometimes ignored in 
discussions of Z39.50-based virtual union catalogs of any form, has to do 
with server-side scalability (rather than client-side scalability, which is 
where the "mystic" bottlenecks are quoted). Many library systems in smaller 
libraries are really only scaled to handle a handful of workstations and 
perhaps the odd web-user visitng the OPAC from home. But, if you make the 
local library visible in a large-scale, regional or national virtual union 
catalogue using parallel searching - then you had better make sure the 
local systems are capable of handling the load. Either that, or you need to 
devise ways to avoid sending user queries to irrelevant databases.

--Sebastian
--
Sebastian Hammer        <quinn at indexdata.dk>            Index Data ApS
Ph.: +45 3341 0100    <http://www.indexdata.dk>    Fax: +45 3341 0101