[WEB4LIB] Collecting Stats.

Tue Oct 26 14:58:32 EDT 1999

On Tue, 26 Oct 1999, Ed Veal wrote:

>I am interested in finding our what different library report. We are
>going to start reporting stats from our web site, however we are not
>sure what to report. We use WebTrends software to produce our reports.
>We are thinking of reporting the "User Sessions" insted of "Hits". What
>do you think?
>
>Ed Veal  (edveal at mail.ci.lubbock.tx.us)
>Lubbock City-County Library
>library.ci.lubbock.tx.us

Ed,

IMHO, this is a double-edged sword waiting to hurt people all over the
place.  True, "User Sessions" sound a lot more like traditional person
counter type statistics, but unlike that beam across the door, User
Sessions are estimated, and can greatly mislead you.  There is one case,
where a multi-user machine (such as one that offers Lynx access, or runs
as a Firewall) will never count more than one User Session (of several
thousand hits).  And on the other hand, you have spiders & robots, which
produce sessions, but aren't really "visitors".  And if someone points
this out, well, the number you have is just an estimate, and doesn't
really point to anything "true".

Now, the same can be said of hits - what with proxy servers like AOL has
that (thankfully) cache an entire remote site for all of their users, your
hit count is *dramatically* less than the real number of pages viewed.  
Additionally, there are also all the millions of browser caches that don't
refresh unless the source has changed.  *SO*, the hit count is also
flawed, in that it doesn't give you a true count of pages *VIEWED*. *BUT*
it *DOES* give you a true count of pages *SERVED* by your machine, which
can speak greatly of increased resource usage, etc., etc.

Another real problem that we will increasingly face is the problem of
really tracking users.  Our website (www.lib.de.us) uses a search engine
and "pages" of calls to templates, to emulate what appears to be a browser
through static pages.  Why do this?  Well, we now can add and delete URLs
and their descriptions in a database continuously being updated.  The user
doesn't know they are using a database & search engine, but we now longer
have to "update" the pages - they're just empty templates that get filled
with data from the database....  Having said all that - when was the last
time a log analyser was able to track a "session" via the cgis???  One,
the URLs produced are *extremely* long (> 255 chars), and may include
user-provided input, so tracking the "most common path" is very difficult.
Two, many analysis programs see something going into a cgi and STOP, since
it is difficults to "know" what happens next.  This is a real problem for
"link checkers". :-(

My advice would be to resist the sheep-herding, non-standardized "methods"
of making Internet stats just like other "real-world" stats, figure out
why you need the stats (More network capacity?  More CPU power?), and
concentrate on providing *those* indicators.

Just my $0.02

James Cayz

+--------------------------------------------------------------------------+
| James Cayz  #  cayz at lib.de.us #  DelAWARE homepage: http://www.lib.de.us |
| Network Processing Administrator #  302-739-4748 x130 # Fax 302-739-6948 |
| Delaware Division of Libraries # 43 S. DuPont Hwy / Dover, DE 19901-7430 |
+--------------------------------------------------------------------------+