FW: [WEB4LIB] Web Log Analysis

Gimon, Charles A CAGimon at mpls.lib.mn.us
Thu Jan 11 17:45:55 EST 2001


1. I produce and distribute a monthly report which includes:

--Unique Daily Visitors for the month

This is the number of unique IP addresses/hostnames logged for each day,
totaled for the month. This corresponds in a rough way with our gate counts.

--Total Page Hits overall

With subtotals for these three categories: staff, internal/public (our
public Internet workstations), and external. Staff is a separate driectory
on the webserver; the other two categories are distinguished by IP address.

--Top Pages overall

The most popular pages in descending order by number of page hits. Also
lists of the most popular pages for staff, internal/public, and external.

--Top Categories in the LIST (our web directory)

Since these appear as the appended GET query in the URL, for example:

http://www.mpls.lib.mn.us/list.asp?subhead=Science+_and_+Technology:Astronom
y

I can extract these and report on them.

--Top Subscription Databases by Clickthrough
--Top Links in the LIST by Clickthrough

We log clickthroughs on these links; this is done in a database, however,
not through web logs.

--Search Queries in the LIST, and for our entire site

Again, these are appended GET queries, extracted and reported on in a
separate document.

I also produce specialized reports on request about usage of specific pages
or areas of the site.

Some things that I've done on other jobs or for myself personally:

--Usage by Domain

Fun to do ("We got three hits from Estonia!"), but can lead to questions
about unresolved IP addresses and misconfigured hosts ("Where is .arpa,
anyway?").

--Usage by Browser/Platform

Not as straightforward to do from scratch (MSIE being Mozilla in logs, etc.)
but can still give interesting info. Related to this is:

--Robot Activity

Who's indexing you? Often you can just pull the last several totals from a
Browser/Platform report to get this.

--Error logs

Can be your best friend in finding troublesome spots in your site that users
still haven't reported. Might be in a separate file from your regular
webserver logs, depending on your server and configuration.

--Referer logs

Can help you see who is linking to you. Also might be in a separate file.

All of the above come with all the usual caveats: that you're counting
machines, not people in many cases, that your pages could be cached
elsewhere, that info can be spoofed, etc. etc.

Also note that exactly which items your server logs is generally a
configurable option; be sure that your server is, in fact, logging the
referer info before promising anyone a report on it.

2. I've always written this stuff from scratch in perl, and customized it
for my needs. (I can't stand overpriced, underfeatured pre-written software
for little tasks like this...) This sort of thing isn't rocket science; the
only thing even possibly off-putting about it is that the files you're
working with can get awfully large.


--Charles Gimon
  Web Coordinator
  Minneapolis Public Library






> -----Original Message-----
> From: Maribeth Manoff [mailto:manoff at aztec.lib.utk.edu] 
> Sent: Thursday, January 11, 2001 1:56 PM
> To: Multiple recipients of list
> Subject: [WEB4LIB] Web Log Analysis
> 
> 
> Hello All,
> 
> I am working on doing something with the logs generated by our Web
> server (something other than deleting them, that is :)  I found some
> good information in the list archives on Web log analysis software, as
> well as a good article in Online magazine on this topic.  I 
> downloaded a
> trial version of WebTrends Log Analysis software, and got it to work
> with our logs.  What I don't have a good sense of, though, is what is
> the information that I really want or need?  I plan to talk to other
> librarians here to get their input, and I would like to ask for your
> assistance also.  If you have the time, could you reply to me (I will
> happily summarize for the list) with answers to the following 
> questions:
> 
> 1)  What types of statistics are you collecting on your 
> library Web site
> usage?
> 
> 2)  What software are you using to collect these statistics?
> 
> Thanks very much,
> Maribeth 
> -- 
> ----------------------------------------------------
> Maribeth Manoff
> Coordinator for Networked Service Integration
> 647 Hodges Library               mmanoff at utk.edu
> The University of Tennessee      voice: 865-974-2876
> Knoxville, TN 37996-1000         fax: 865-974-0626
> 


More information about the Web4lib mailing list