[WEB4LIB] Re: Web Log Analysis

Gimon, Charles A CAGimon at mpls.lib.mn.us
Fri Jan 12 10:17:05 EST 2001


Well, this is the core of what I do to get the hits per page for the month:

[begin code]

opendir(LOGDIR,"$logfiledir");
@files = grep(/^ex$yr$mo\d\d\.log$/i,readdir(LOGDIR)); #grab the logs for
this year/month
closedir(LOGDIR);

for (@files) {
   open(F,$_);
   while (<F>) {
      chomp $_;
      my $internal;
      my $staff;
      $_ =~ s/ - / X /;
      my @p = split(/\s/,$_);   #split the line into an array on whitespace
      my $file = $p[4];
      lc($file);
      if (($file =~ /(?:htm|asp|pl)$/i) & ($file !~ /demo/i)) {  
                     #if it's a file I want to count
         if ($p[1] =~ /^10\./) {$internal = 1}                   
                     # if it's one of our internal IPs
         if ($p[1] =~ /^209\.105\.(?:7[6789]|89)/) {$internal = 1}
         if ($p[1] =~ /^198.174.51/) {$internal = 1}
         if ($file =~ /(?:staff|stf)/i) {    
                     # if it's in our staff directory
            $file =~ s/\/staff//i;
            $file =~ s/\/stf//i;
            $staff = 1;
            }
         $gtot{$file}++;
         if ($staff == 1) {
            $totsta{$file}++;
            } elsif ($internal == 1) {
            $totint{$file}++;
            } else {
            $totext{$file}++;
            }
         }
      }
   }
close(F);


for (sort keys %gtot) {
   printf("%s : %d : %d : %d : %d\n", $_, $gtot{$_}, $totsta{$_},
$totint{$_}, $totext{$_});
   }

[end code]

This produces output that looks like this:

/database.asp : 4723 : 2866 : 512 : 1345

Which I can pull into Excel and do further things with. It wouldn't be a big
deal to have perl output the entire report, but the above gives you the
basic idea on one piece of paper.

Like I say, this is customized for our setup, and I take this core and
customize it further to meet special requests. I change the directory I'm
reading, the specific files I'm requesting (both log files and web pages
being hit), etc. etc. Also, we are running IIS, which allows you to pick and
choose which data you want to have logged. You won't be able to just use
this as is.

In perl terms, this is "first week of class" stuff, which is why I hate to
hear about people spending big bucks on "log analyzers". If I were to
actually spend money on something like this, I would want it to:

--know that the x-th field was IP address/hostname, just by looking at the
contents, no matter what server I'm using or how it's configured
--be able to track the paths visitors take through the site, and analyze
them
--be able to give some idea of the time between page hits for visitors that
click from one page to another (this isn't "time spent in the site", but a
rough facsimile of it)
--be able to customize absolutely everything, and I mean *everything*

and even then I'd be looking at the price tag very suspiciously. 


--Charles Gimon
  Web Coordinator
  Minneapolis Public Library

> -----Original Message-----
> From: Mary Pugh [mailto:mpugh at orcaslibrary.org] 
> Sent: Thursday, January 11, 2001 5:34 PM
> To: Multiple recipients of list
> Subject: [WEB4LIB] Re: FW: Web Log Analysis
> 
> 
> Charles,
>   Would you be willing to share your script?
> 
> 
> At 02:40 PM 1/11/01 -0800, Gimon, Charles A wrote:
> >1. I produce and distribute a monthly report which includes:
> >
> >--Unique Daily Visitors for the month
> >--Total Page Hits overall
> >--Top Pages overall
> >--Top Categories in the LIST (our web directory)
> >--Top Subscription Databases by Clickthrough
> >--Top Links in the LIST by Clickthrough
> >--Search Queries in the LIST, and for our entire site
> >--Usage by Domain
> >--Usage by Browser/Platform
> >--Robot Activity
> >--Error logs
> >--Referer logs
> >
> >2. I've always written this stuff from scratch in perl, and 
> customized it
> >for my needs. (I can't stand overpriced, underfeatured 
> pre-written software
> >for little tasks like this...) This sort of thing isn't 
> rocket science; the
> >only thing even possibly off-putting about it is that the 
> files you're
> >working with can get awfully large.
> >
> >--Charles Gimon
> >   Web Coordinator
> >   Minneapolis Public Library
> 
> 
> 
> Mary Pugh				Orcas Island Library District
> Administrative Assistant		500 Rose Street
> Network Administrator			Eastsound, WA 98245
> www.orcaslibrary.org			360.376.4985
> 


More information about the Web4lib mailing list