[Web4lib] Lies, damn lies, and usage statistics?

Fri Mar 9 07:00:06 EST 2007

I had always assumed that the statistics were accurate also. We were
thinking about purchasing "Scholarly Stats" to try to make our
electronic Resources Librarian's job easier, does anyone know if it
suffers from similar problems?

Josh

--
Joshua Ellsworth
Library System Administrator
Guillermin ILRC
Liberty University
434.592.3243        jrellsworth at liberty.edu

-----Original Message-----
From: web4lib-bounces at webjunction.org
[mailto:web4lib-bounces at webjunction.org] On Behalf Of Stacy Pober
Sent: Thursday, March 08, 2007 9:39 PM
To: web4lib at webjunction.org
Subject: [Web4lib] Lies, damn lies, and usage statistics?

I have been downloading annual database usage statistics for our
library's electronic databases.

Looking at the statistics from one of our vendors (EBSCOhost), I noticed
a peculiar thing.  Some of the databases in the report were ones for
which we had no subscriptions and no access. Yet the report showed usage
for those databases,  and it was for multiple months in several
databases,  so  it was not a one-time computer foul up.

When I contacted their technical support and reported this, they said
that this was a "known issue" and explained that we could deselect those
databases when generating the usage report.  But I didn't want simply to
make the obviously bad data invisible,  I wanted to know why it was
there and whether the other figures, those that were not so obviously
fictional, were accurate.

When pressed for more information on the exact nature of the problem,
the helpful  support person did not elaborate, but wrote:

   " I have filed a Service Issue (think of it as a work order) to have
   your  statistics "scrubbed" so that you will only be left with your
   actual statistics."

When asked for the specific reason that we are seeing fictional usage
statistics for several databases, he again assured me it was a "known
issue"  (I don't know if he thought that this was a good
substitute for a detailed explanation.  It is not.)   He sent  no 
technical details and wrote:

   "please rest assured that this type of problem is rare, and that
    the statistics gathered by the system are quite accurate."

Which seems to miss the point.  If we don't know what caused the
problem, why would we assume any of the usage statistics are accurate?
The erasure of  glaringly wrong figures isn't a a reason to believe that
the remaining information in a report is correct.

This isn't the only vendor that provided inaccurate usage data this
year.
Another  vendor's statistics showed zero usage after our subscription
started. Since I had used it  at the beginning of  the subscription
period, it was clear something was wrong.  When this anomaly was
reported, the vendor never explained what the problem was, but sent us
some completely  different (and - surprise! - much higher) usage
figures. 

In the past, I never really thought about this issue, and just assumed
that most  of the database usage information provided  by our vendors
was resonably accurate.  This  was an inappropriately optimistic
assumption. 
As far as I know, there is no way to validate the most of the statistics
provided to us by database vendors.

Some independent data can be obtained from our EXproxy logs, as they
show the number of times users accessed particular databases. 
However, though the EZproxy server has some detailed information about
off-campus use, our on-campus users don't interact with it past the
initial database link selection.

Even if all of our usage was routed through the EZproxy server, those
logs aren't kept for that purpose, and I don't think they show some of
the most useful information, such as the number of  abstracts and
full-text documents accessed.  For the databases with full-text, the
number of full-text articles or documents used is a significant figure. 
The EZproxy logs can be analyzed to show pdf downloads, but many of our
databases offer much of the full-text as HTML.

Our openURL system offers some statistics on full-text retrievals, but
that system only works with full-text access across different databases.
The openURL system  won't come into play for those sessions  where the
search and the full-text are in the  same database.

Aside from the limited nature of the independent usage statistics
available,  doing  accuracy checks on the vendor-supplied statistics
would be a major pain to do on  a regular basis.

I'm just bringing this up as a concern.  I'm sure that I am not the only
librarian who assumed in the past that the vendor supplied usage data
was correct.  Since we  use that data as an important factor in our
database acquisition and renewal decisions, it would be nice to have
some independent assurance of the accuracy of the data we're getting
from vendors.

I don't really think that our database providers are using Ouija boards
to produce our usage reports.  The question is whether they are
routinely checking the validity of  the figures they collect and supply
to us. 
Apparently
some of them  are not doing logic and accuracy testing of the software
they use to produce the usage statistics.

Has anyone checked the accuracy of vendor-supplied database usage data?
If you  have, how did you do it and what results did you find?

--
Stacy Pober
Information Alchemist
Manhattan College
O'Malley Library
Riverdale, NY 10471
stacy.pober at manhattan.edu <mailto:stacy.pober at manhattan.edu>

"If you want to inspire confidence, give plenty of statistics.
It does not matter that they should be accurate, or even intelligible,
as long as there is enough of them."  - Lewis Carroll
_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/