OSCON day 1: Beyond REST? Building Data Services with XMPP PubSub

Its good to be back in Portland for my favorite geek convention: O’Reilly’s Open Source Conference. The overcast sky in Portland is making it a little easier this year to focus on the plethora of excellent speakers and sessions. The first session to really grip and and speak to me was Rabble and Kellan’s “Beyond REST? Building Data Services with XMPP PubSub” presentation.

They started out their presentation stating that they were not “Jabber Heads”, but that they were in the business of building web sites. For Rabble and Kellan, Jabber presents one more tool in their huge tool-chest to build web sites. Jabber wasn’t designed to be a part of a functioning web site, but they insist that it works great for building social web sites that require many people to be notified of updates.

For example, Kellan talked about FriendFeed, a site that lets their users know when their friends share new items. In this example, Kellan pointed out that FriendFeed polls Flickr 2.9 million times in order to check on updates for 45 thousand users. And of those 45 thousand users, only 6.7 thousand are logged in at any one time. This of course, its a poor way of checking for changed content. Kellan says: “Polling sucks!”

To solve this problem its key to leave standard REST web services behind and find a way to use message passing, which is a direct communication way of notifying users of changed content. The open and mature infrastructure that Rabble and Kellan found to use for this service is Jabber. Jabber has 10 years of experience of passing messages around the internet and has been embraced by many companies including Google.

XMPP, Jabber’s protocol, works well for message passing and does not have many of the problems/limitations of HTTP:

  1. XMPP works over persistent connections
  2. It it stateful (SSL becomes cheap)
  3. Designed as an event stream protocol
  4. Natively federated and asynchronous
  5. Identity, security and presence are built in.
  6. Jabber servers are built and deployed to do this stuff.

Given this, Kellan and Rabble decided to piggy-back a notification system on Jabber by sending XML fragments using a PubSub paradigm. In this context, PubSub is a simple method for passing XMPP pubsub stanzas via Jabber. PubSub is nothing more than a convention for how to send XML via Jabber, including a method for embedding Atom fragments in the XML.

Rabble presented using XMPP for FireEagle, Yahoo!’s new personal geolocation service that allows users to provide their current location to other users. For a few users and a few updates you can paginate the data stream into RSS/atom feeds. But once you have more than a few users and frequent updates a paginated stream cannot keep up. What if a user publishes more updates than can an RSS feed can capture? Updates get lost — and for applications using FireEagle missing an update presents a critical flaw. Using a system like XMPP, FireEagle can rely on Jabber to deliver all the updates — exactly what Jabber was meant to do.

Kellan also applied XMPP/PubSub to Flickr and how a Flickr update “Firehose” might work. If Flickr sends a ~2k an atom enriched packet for each new public picture posted at a rate of 60 updates a second, it would take roughly a megabit of traffic. Even a normal DSL line can handle one mbit of traffic, so the network effects are manageable on this level, compared to the polling system that FriendFeed uses. (Kellan also points out that FriendFeed is not doing anything wrong at all — the current web service centric model is simply insufficient for this type of service.)

To deploy your own message passing service based on XMPP/PubSub, you’ll need to follow these 4 easy steps:

  1. Get a Jabber client library. There are many available for all the popular languages.
  2. Set up a Jabber server — again there are many available to choose from. Turn off the features you won’t be needing. (e.g. creating new accounts)
  3. Build a component (according to Jabber XEP-0114)
  4. Integrate the message passing system in your own site.

Pretty simple, overall! The beauty of this approach comes from the fact that all off-the-shelf components were used to build this new notification system. No new magic technology is being created to enable this system, which is a personal metric of mine for determining the likelihood that a new system will succeed.

It’s clear that REST web-services provide the heavy lifting for many Web 2.0 sites, but its also clear that REST and its inherent polling mechanism isn’t the best way of building a user notification system. With social networking sites not about to fade away, we’re going to see an increasing need for capable message passing sites. And since Jabber is a well established and supported system, it only makes sense to piggyback on this great technology. Thanks for the awesome presentation Rabble and Kellan!

Update: Rabble posted the slides for this talk.

(And a big thanks to Tim O’Reilly for letting me guest blog OSCON here!)

  • Great writeup Robert, Rabble has posted the slides for his and Kellan’s talk on slideshare
    Beyond REST? Building Data Services with XMPP PubSub

  • Jim Stogdill

    Hey Robert, great post. Seeing lots of people beginning to use XMPP/Jabber this way lately. I wish I could have attended this.

    Did the presenters mention any other pub/sub implementations and/or their reasons for choosing XMPP over one of them? Perhaps something like Apache Servicemix for example?

  • Dear Internet,
    Atom is not an acronym.

  • Thanks for the great write up!

  • Robert Kaye


    They didn’t mention any other technologies they may have evaluated. One of the reasons they gave for using XMPP was that it was widely deployed and has a 10 year history, which makes it easy to work with.

  • My main gripe about XMPP is that its HTTP transport, BOSH, is pretty low tech. It just uses long polling. In this single regard, as an end to end solution XMPP seems somewhat lacking. Otherwise yes, this owuld be the right tool for the job.

    Hope you enjoy OSCON again this year, it is with great remorse that I had to skip it this year.

  • Lee

    I’m not sure I understand what ‘federated’ means as part of

    4. Natively federated and asynchronous

    Can anyone explain it?

  • Federated systems are a pretty broad topic, generally having to do with cross-domain traffic where theres no centralized master. In the simplest terms, you can see it in the ability for anyone with an XMPP server to chat with their friends using GTalk. GTalk respects your XMPP server as its own little principality and based on that trust your XMPP server can say “Lee is a user and he is trying to contact one of your GTalk users”. Each system remains independent and without a central master, yet they can still interact.

    Moving up the complexity scale, federalization often is used to talk about higher order identity management. If you have a parent corp that has a bunch of child corps, you want each child corp to be able to function somewhat autonomously within its own domain, but you still need some overriding controls. Federalization would permit that duality: child-corp.com is both independent yet still under the auspices of parent-corp.com.

    SAML and Liberty Alliance were some of the first people delivering protocols explicitly to address Federalization (federalization was very much the raison d’etre for these specs), although arguably SMTP and XPP have both been “federalized” protcols since day one.

  • eric carle

    could some please explain how an xmpp webservice would push a message to the browser? i do not see how the browser would get the message besides polling over HTTP (or using comet push)

  • It is a shame that this is entitled “Beyond REST…” because REST is not defined by polling. REST is also 100% valid in a callback situation. Unfortunately, I think this session title my inappropriately tarnish REST as “old news” when in fact, like the law of gravity, it is and will always be fundamentally relevant. Whether people choose to embrace it will be another story however.

  • Emil Hesslow
  • eric carle

    hi emil,

    thanks for your answer!

    so it’s essentially comet, right? (http://metajack.wordpress.com/2008/07/02/xmpp-is-better-with-bosh/)

    but if comet helps to prevent polling then i agree with Mike Schinkel, this push model is not limited XMPP at all, you pretty much can achieve the same thing with REST+comet (the presentation seems to imply that REST is bad because of polling).

    or am i missing something?

  • Emil Hesslow

    From what I have understood BOSH and Comet is related. BOSH is just one part of XMPP. If you read the article you linked to you see “The Downside of BOSH”.

    If you compare XMPP BOSH and Comet there is differences. XMPP is a standardized protocol. You get authorization and stuff like that. Comet is just hold a connection open for a long time so you can get answers. There is no specification what messages should look like or what you should answer and stuff like that.

    There also exists a ton of servers, clients and libraries for XMPP. So it is very easy to take parts that exists and use.

    But yes you could probably do the same thing over REST+comet but I don’t know anyone that have wrote anykind of protocol that everyone can follow. XMPP have that. So if Flickr would put up a XMPP API today it would probably take like 5 minutes for anyone that have XMPP knowledge to start using it.

  • Another benefit to XMPP over hand-rolling a Comet system is the addressing. You can write a component or client that can easily target any endpoint by identity, regardless of how it is currently connected. “Currently connected” might mean over traditional XMPP/tcp, BOSH, or completely disconnected. Presence is the glue that turns all of these potential connection abilities into routable channels.

  • It seems that a bunch of the comments here are focussed on XMPP for the end-user service traffic, with BOSH as one of the possible options to make that happen. However, the discussions we have had at the XMPP Summit, mostly focussed on the service service traffic.

    In that scenario, only the services themselves need to implement the exchange of updates through XMPP publish-subscribe, leaving the rest of the service as-is. End users wouldn’t necessarily notice a difference, apart from lower latency and significant lowering of the chance of data loss.

    Note that for third part applications, the same mechanism can be used to keep the client up-to-date. These are usually not run from within a browser, and as such are not limited to use HTTP. Adding XMPP capabilities to such an application is relatively straight-forward.

    The last use case, showing updates on a web page as they come in, don’t necessarily involve XMPP it all, but could use any of the current technologies like Comet. BOSH however does seem to have appeal for some, and services like Chesspark’s do a pretty good job at that.