[WEB4LIB] inflated Web server stats for PDFs

Thu May 18 21:40:15 EDT 2000

Yes - I have seen this on the NT server we use for some of our digital library projects.  The 'follow-on' accesses are clearly identifiable in the access log with a HTTP status code of 206, which is defined as meaning HTTP_PARTIAL_CONTENT.  We have chosen to filter those out of much of our processing.

-Edward Spodick, Systems Librarian
Hong kong university of Science & Technology
lbspodic at ust.hk

At 1:57 PM -0700 18/5/00, Charles Dean wrote:
>Hello:
>
>I'm seeing strange outliers in our monthly Web server traffic statistics -- huge numbers of requests for a given PDF file (usually an e-reserve).  The numbers exceed all reasonable accounting for heavy or peak demand during a semester.  One file was requested over 24,000 times, when typical traffic ranged between 100-300 requests that month for other PDFs.
>
>One library here did some initial testing using Acrobat Reader (3.0 and 4.0) as both a plug-in and a helper application in Netscape.  They found the plug-in configuration resulted in multiple requests to the server for "chunks" of the same file, as many as 15 requests/second for PDF files larger than 2 Mb.  The helper app config produced a single request per file regardless of size.
>
>Granted the test file is a large file (maybe not so large anymore? -- the one in production that got 24,000 hits was 5 Mb), but the behavior is a little unsettling and would certainly bear on how we interpret our Web resource statistics.
>
>Anyone seen anything similar, or know more about the way Netscape/Reader treats large PDF files for download and display?
>
>Regards and thanks,
>
>Charles
>
>
>
>
>Charles W. Dean
>Library Technology Group
>University of Wisconsin-Madison
>cdean at library.wisc.edu
>(608) 265-2844
>http://www.library.wisc.edu/