[WEB4LIB] Controlling Bloglines crawler?

Thomas Dowling tdowling at ohiolink.edu
Wed Nov 10 16:05:13 EST 2004


Thomas Dowling wrote:

>Has anyone encountered a way to throttle (in either sense of the word!) 
>the Bloglines RSS crawler?  We have a server with a large number of 
>feeds.  While Bloglines doesn't fetch any individual feed more than once 
>an hour, they tend to fire off requests for all of our feeds in rapid 
>succession, creating bursts of heavy traffic that can do bad things if 
>they come in during an already high load on the server.  Any chance 
>they'd understand a 304 Not Modified HTTP status?
>
>
>  
>

Thanks to the people who responded with hints (more polite than "RTF 
Standard!") about the <ttl> and <skipHours> elements in RSS.  They do 
exactly what I want (respectivesly, tell an RSS reader not to pull 
updates more often than X minutes and tell a reader not to pull feeds 
during certain hours of the day).

Unfortunately, Bloglines disregards both elements; on inspection, I see 
my copy of RSSReader does also.  I'll experiment with the 304 HTTP status.


Thomas Dowling
tdowling at ohiolink.edu




More information about the Web4lib mailing list