Discarding entries in log from robots and such
Zimoski, Tom
Tom.Zimoski at fresnolibrary.org
Thu Sep 16 18:46:17 EDT 2004
Previously on web4lib:
"After discarding all of the entries from robots, spiders and crawlers
which could be handled
automatically by looking at the browser information..."
________________________________________________________________________
_____
Here are a couple of lines from the log I get from our system
administrator:
64.68.82.184 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedjust.html
HTTP/1.0" 200 8268
64.68.82.30 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedmil.html
HTTP/1.0" 200 6903
With this to work with is there any practical way to exclude entries
from robots and the like? It seems like the information at
http://www.iplists.com/ and
http://www.searchengineworld.com/spiders/spider_ips.htm could be
helpful, but my time might be better spent trying to get user agent
information included in the log.
Thanks for your attention.
Tom Zimoski
Reference Dept/Fresno County Library
More information about the Web4lib
mailing list