[Web4lib] Lies, damn lies, and usage statistics?
Ellsworth, Joshua R.
jrellsworth at liberty.edu
Fri Mar 9 07:00:06 EST 2007
I had always assumed that the statistics were accurate also. We were
thinking about purchasing "Scholarly Stats" to try to make our
electronic Resources Librarian's job easier, does anyone know if it
suffers from similar problems?
Josh
--
Joshua Ellsworth
Library System Administrator
Guillermin ILRC
Liberty University
434.592.3243 jrellsworth at liberty.edu
-----Original Message-----
From: web4lib-bounces at webjunction.org
[mailto:web4lib-bounces at webjunction.org] On Behalf Of Stacy Pober
Sent: Thursday, March 08, 2007 9:39 PM
To: web4lib at webjunction.org
Subject: [Web4lib] Lies, damn lies, and usage statistics?
I have been downloading annual database usage statistics for our
library's electronic databases.
Looking at the statistics from one of our vendors (EBSCOhost), I noticed
a peculiar thing. Some of the databases in the report were ones for
which we had no subscriptions and no access. Yet the report showed usage
for those databases, and it was for multiple months in several
databases, so it was not a one-time computer foul up.
When I contacted their technical support and reported this, they said
that this was a "known issue" and explained that we could deselect those
databases when generating the usage report. But I didn't want simply to
make the obviously bad data invisible, I wanted to know why it was
there and whether the other figures, those that were not so obviously
fictional, were accurate.
When pressed for more information on the exact nature of the problem,
the helpful support person did not elaborate, but wrote:
" I have filed a Service Issue (think of it as a work order) to have
your statistics "scrubbed" so that you will only be left with your
actual statistics."
When asked for the specific reason that we are seeing fictional usage
statistics for several databases, he again assured me it was a "known
issue" (I don't know if he thought that this was a good
substitute for a detailed explanation. It is not.) He sent no
technical details and wrote:
"please rest assured that this type of problem is rare, and that
the statistics gathered by the system are quite accurate."
Which seems to miss the point. If we don't know what caused the
problem, why would we assume any of the usage statistics are accurate?
The erasure of glaringly wrong figures isn't a a reason to believe that
the remaining information in a report is correct.
This isn't the only vendor that provided inaccurate usage data this
year.
Another vendor's statistics showed zero usage after our subscription
started. Since I had used it at the beginning of the subscription
period, it was clear something was wrong. When this anomaly was
reported, the vendor never explained what the problem was, but sent us
some completely different (and - surprise! - much higher) usage
figures.
In the past, I never really thought about this issue, and just assumed
that most of the database usage information provided by our vendors
was resonably accurate. This was an inappropriately optimistic
assumption.
As far as I know, there is no way to validate the most of the statistics
provided to us by database vendors.
Some independent data can be obtained from our EXproxy logs, as they
show the number of times users accessed particular databases.
However, though the EZproxy server has some detailed information about
off-campus use, our on-campus users don't interact with it past the
initial database link selection.
Even if all of our usage was routed through the EZproxy server, those
logs aren't kept for that purpose, and I don't think they show some of
the most useful information, such as the number of abstracts and
full-text documents accessed. For the databases with full-text, the
number of full-text articles or documents used is a significant figure.
The EZproxy logs can be analyzed to show pdf downloads, but many of our
databases offer much of the full-text as HTML.
Our openURL system offers some statistics on full-text retrievals, but
that system only works with full-text access across different databases.
The openURL system won't come into play for those sessions where the
search and the full-text are in the same database.
Aside from the limited nature of the independent usage statistics
available, doing accuracy checks on the vendor-supplied statistics
would be a major pain to do on a regular basis.
I'm just bringing this up as a concern. I'm sure that I am not the only
librarian who assumed in the past that the vendor supplied usage data
was correct. Since we use that data as an important factor in our
database acquisition and renewal decisions, it would be nice to have
some independent assurance of the accuracy of the data we're getting
from vendors.
I don't really think that our database providers are using Ouija boards
to produce our usage reports. The question is whether they are
routinely checking the validity of the figures they collect and supply
to us.
Apparently
some of them are not doing logic and accuracy testing of the software
they use to produce the usage statistics.
Has anyone checked the accuracy of vendor-supplied database usage data?
If you have, how did you do it and what results did you find?
--
Stacy Pober
Information Alchemist
Manhattan College
O'Malley Library
Riverdale, NY 10471
stacy.pober at manhattan.edu <mailto:stacy.pober at manhattan.edu>
"If you want to inspire confidence, give plenty of statistics.
It does not matter that they should be accurate, or even intelligible,
as long as there is enough of them." - Lewis Carroll
_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/
More information about the Web4lib
mailing list