Understanding Web Operations Culture – the Graph & Data Obsession

We’re quite addicted to data pr0n here at Flickr. We’ve got graphs for pretty much everything, and add graphs all of the time.

-John Allspaw, Operations Engineering Manager at Flickr & author of The Art of Capacity Planning

One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate. Web traffic is something that companies typically keep very secret, and often the only time engineers can talk about it is late at night, at a bar, and very much off the record.

There are many good reasons for keeping this kind of information confidential, particularly for publicly traded companies with complicated disclosure requirements. There are also downsides, the biggest being that is difficult for peers to learn from each other and compare notes.

John Allspaw recently created a WebOps Visualizations group on Flickr for sharing these kinds of graphs with the confidential information removed. Here’s an example of a traffic drop seen both by Flickr & by Last.FM that coincided with President Obama’s inauguration.

John Allspaw shows drop in web traffic to Flickr during Obama inauguration

Similar traffic drop on Last.FM seen on the right

Traffic Drop to Last.FM during Obama inauguration on right

Google saw a similar drop as well

Traffic Drop to Google during Obama Inauguration

Was it because everybody went to Twitter?

Traffic Spike on Twitter during Obama Inauguration

Besides being an interesting story, sharing these kinds of graphs help people build better monitoring tools and processes. As just one example: How should the WebOps team respond to this dip in traffic? Is it an outage? The inaguration was a very well known event and so it’s easy to explain the drop in traffic… what happens when a similar drop in traffic occurs? Should the WebOps team be looking at CNN (or trends in twitter) along with everything else?

How do you tell when that unexpected 10% drop in traffic is really just people with something more important to do than browse your site?

(Note: Updated since original posting to add Google & Twitter graphs and annotations, and to switch the Last.FM graphic with an annotated one after I got permission.)

tags: , , , , , , , , ,
  • Data porn is widely prevalent in operations – without it we’re blind. Too much of it, and we’re overwhelmed. Finding sanity in the sea of available data is the only way to succeed under the high loads we’re faced with every day.

    I should post the inverse spike present on Twitter during the inaugration. We drank your milkshake (traffic.)

  • Checking the news is usually among the first three things when the traffic goes wonky and there aren’t any alarms going off to signal bad health. Predicting traffic swings as natural vs unnatural has been puzzling us for years, and we don’t yet have a good way to do it without a lot of false alarms.

    Take this week, for example. The super bowl leaves a very distinct pattern of traffic every year in one of our sites.

    Starts out like a normal day until pre-game starts, decreases until halftime, spikes back almost to normal at halftime, and depending on how the game goes, it falls off even further during the last 10 minutes or so then shoots up above a normal Sunday night as everyone does one last internet before bed.

    There’s a secondary effect on the operations team for stuff like this. We’ve become more aware of things we don’t care about personally, like awards shows, the first episode of whatever TV show is popular, said football game, etc. Then there’s the odd unscheduled things, like weather. A couple summers ago there were several days of heavy rains in the Mid-West. It showed up on the traffic, since everyone was inside on the computer.

    It works in both ways, increasing and decreasing traffic. The hard part is tracking the context, attributing traffic to the correct cause, and not forgetting four months down the road that the traffic spike was because of a news story, and not because someone had pulled off some miraculous promotion. I have a 2007 calendar somewhere with highwater marks on it. Anna Nicole Smith. Virginia Tech Shootings. Rollout of a new feature and a new baseline. The red carpet photo gallery from the Emmys. Some horrifically bad ads code that went in during December. :D

    It’s fascinating to me, but I’m not sure anyone else cares.

  • John & Mandi,

    Please send me graphs and I’ll update this post. Please please please!


  • John –

    Really cool graphs! We saw something very similar to the inauguration drop on the night of election day. When Obama took the stage in Chicago, aggregate traffic dropped dramatically across all the sites that we deliver – these are dominated by media sites (newspapers, tv stations, etc.)



  • Thanks for posting the twitter graphs — it’s amazing to see our peaks against the dips in the other services.

  • Twitter is the way of the future. People collaborating from one side of the globe to the other in an unbiased fashion.

    NICE post !

  • Hi,
    This was a very interesting article and hits on the problem of understanding web trends. What do we do with spikes and troughs when there is not such a big event happening or understand such dips annually?
    Any help appreciated