It was a freight train headed my way for some time, I could see the pressure building weeks in advance. Middle last week the levy finally broke through, first at 10pm, then at 6pm, finally at about noon: ![]()
TRAFFIC ALERT FOR teledyn.com
99% of your daily traffic quota has been used with 12:50.16 of your daily cycle elapsed.
At the top of the webstats, the smoking gun: 30,000 requests for the Drupal-generated RSS feed from teledyn.com.
What about If-Modified-Since?
That was my first thought: These can't be 30,000 unique requests, so why don't they all just register 304 codes telling them I haven't posted a new story to that site in days? Isn't that what RSS protocol is all about?
It is supposed to conserve bandwidth, not leave the tap running.
That's the intent, but the reality depends on many things, not the least of which is an assumption all these RSS readers are paying attention to the caching headers, and properly obeying the fine points of protocol ... most, it seems, do not.
MovableType is maybe more forgiving, accepting the loose "later than this" definition of If-Modified-Since, but based on that IETF caution, the Drupal crew have opted for the more fool-proof narrow interpretation of "since this exact time" and that, my theory goes, is more strict than a lot of RSS readers are playing.
Whatever the cause, it's the RSS that's chewing the bytes and long term it's not a rosy prognosis for RSS as an easy way to track changes in a website. The problem is, of course, topological: The number of arcs into any one node will always have hard limits, something well known to those who've suffered the SlashDot effect.
And hosting an RSS feed, advertising that easy polling arc to your site, it is like having a permanent listing on the front page of SlashDot.
The Bottleneck Problem
The way RSS is to work, everyone subscribes to a small file on your site. The critical word there is 'everyone'.
The RadioUserland aggregator accounts for a noticeable chunk of my stats, and while Lawrence assures me the Manila web-based aggregator will cache across the server (each client gets the same cached copy) most of Userland is using the client-side version, and that one doesn't.
So I'm a victim of my own success? I bought some time with a quick hack forking Drupal to strip HTML from the RSS descriptions and chop them down to the intro sentence, and that has cut the RSS from 24k of summary to just over 4k, but that's at a significant loss of function, and it's only a stop-gap, a temporary solution slowing the faucet trickle back to a drip, but knowing that each drop happens faster than the last.
My feed is now less inviting, and I've only bought some time.
Unsustainable Growth
Herein the black hole of RSS: If your feed works, if you are successful in attracting subscriptions on a global scale, if you do it right, you are doomed.
As friends tell friends, as links lead to visits which lead to subscribers, the snowball rolls on towards that day like last Friday. RSS may have the potential to be a saver on bandwidth, but when you are getting hit once an hour or more by thousands of sites, 24,000 extra hits ads up, and it's all the worse when so many are using broken clients that ignore the caching rules.
An RSS Network
I'm not sure there is an obvious solution. We might install globally distributed caching of RSS files on behalf of the sites, isolating them from the broken clients, but isn't that already in place? Isn't that what proxy servers for AOL and other ISPs do to "speed up your internet" already?
If the worst offender really is all those Userland clients, and maybe this applies to the other RSS readerwares emerging in their wake, maybe these companies should be encouraged to proxy their client software through their own servers enroute to the RSS host site.
That way we only need be assured of one call per hour for all of the Userland clients, and we're reasonably assured that this one call will use the correct ETag headers to ensure it plays nice with RSS space.
Or maybe this can be done without servers at all, directly in a new generation of RSS readers which will relay feeds in a peer-to-peer network the way a Gnutella/Kazaa client will relay another node's collection down the wire. You'd subscribe to my feed, but this client would autodiscover that a closer friend of yours, someone who's RSS-aggregation feed you browse, can provide this feed to you by proxy.
In fact, you may not need my feed at all if this aggregator buddy's feed has collected my posts with other opensource hippie sites and can provide you with a composite feed where the news is hourly different instead of my lazy two-days-maybe publishing cycle.
The astute among you, or more precisely the astute among both Joey and Paul who are both pretty astute, will instantly recognize this as the cascading aggregator network proposal we were mulling in those final days of OpenCola, right before we were all downsized into obscurity. That may not make sense to most readers except to rest assured that we'd already worked out a lot of the fiddly details on how such a cascading aggregator would have to work, and even some ideas on how to extract meaningful statistics from it.
Only now, with last week's discovery of the great sucking sound RSS can make when set out into the real world, and now considering the great wealth of knowledge channels increasingly if naively being set to sail in the edge of that whirlpool, maybe just maybe it's time for someone to dust off our notes on this and reconsider whether a P2P RSS reader might not be such a bad idea after all ...
- mrG's blog
- 15425 reads

![[cover:Seal of God]](http://www.teledyn.com/mt/archives/sealofgod.gif)




Latest Updates