Discarding entries in log from robots and such

Zimoski, Tom Tom.Zimoski at fresnolibrary.org
Thu Sep 16 18:46:17 EDT 2004


Previously on web4lib:
"After discarding all of the entries from robots, spiders and crawlers
which could be handled
automatically by looking at the browser information..."
________________________________________________________________________
_____


Here are a couple of lines from the log I get from our system
administrator:

64.68.82.184 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedjust.html
HTTP/1.0" 200 8268 

64.68.82.30 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedmil.html
HTTP/1.0" 200 6903

With this to work with is there any practical way to exclude entries
from robots and the like?  It seems like the information at
http://www.iplists.com/ and
http://www.searchengineworld.com/spiders/spider_ips.htm could be
helpful, but my time might be better spent trying to get user agent
information included in the log.

Thanks for your attention.

Tom Zimoski
Reference Dept/Fresno County Library




More information about the Web4lib mailing list