You may also download this file. Running time: 00:10:46
If there’s a site that exemplifies explosive growth, it has to be Twitter. It seems like everywhere you look, someone is Tweeting, or talking about Tweeting, or Tweeting about Tweeting. Keeping the site responsive under that type of increase is no easy job, but it’s one that John Adams has to deal with every day, working in Twitter Operations. He’ll be talking about that work at O’Reilly’s Velocity Conference, in a session entitled Fixing Twitter: Improving the Performance and Scalability of the World’s Most Popular Micro-blogging Site, and he spent some time with us to talk about what is involved in keeping the site alive.
James Turner: Can you start by describing the platforms and technologies that make Twitter run today?
John Adams: Twitter currently runs on Ruby on Rails. And we also use a combination of Java and Scala, and a number of homegrown scripts that run the site. We also use a lot of open-source tools like Apache, MySQL, memcached.
JT: What type of hardware are you running on?
JA: It’s all Linux, so a lot of x86 hardware. I can’t tell you the brands or how many.
JT: Do you make any kind of attempt to stay homogeneous in that?
JA: Yes, we do. All of our hardware is very consistent. It makes deployment of new software very easy. And we also use a number of configuration management tools like Puppet to deliver software to those machines.
JT: As anyone can see, Twitter has had a pretty explosive growth, especially recently. Were you prepared for this kind of ramp up?
JA: I don’t think so. I mean we’re growing week over week in enormous numbers. And we spend a lot of time calculating the growth and scalability of the site to make sure that we can handle the upcoming load.
JT: I mean obviously there are events like Oprah decides she’s going to Tweet that are going to be spikes. Do you try to get warning of that stuff?
JA: Yeah. And frequently we know of major events happening. Major events are very predictable like Macworld, even any massive amount of media interaction, we have some fair warning beforehand.
JT: There was a period of time when Twitter’s reliability was kind of a sore point and people got kind of sick of the little birdie telling them the site was unavailable. What kind of effort was required to solve those problems? And what was the methodology you used?
JA: First thing, it was probably the whale, not the bird.
JT: Oh, sorry. You’re right. The whale.
JA: If we see the bird, things are probably okay. But the thing is we have a very metrics-driven culture. And as part of my talk at Velocity, I’ll describe this. But we have a cycle. And the methodology has always been to discover the bottlenecks using detailed metrics and reporting of data. Resolve that bottleneck. And that could be modifying software, modifying process or changing the way that we handle incoming messages. Close that bottleneck. Just make it go away. And then repeat that loop over and over and over again.
And we do that through graphing. We do that through instrumenting the application. And it’s very important that we don’t make any changes until we’ve looked at the instrumentation. And I think one is to not get — sites frequently get caught up in this, “Well, maybe I’ll try something; it’ll make things better.” And we really don’t work that way. We’re very much a metrics-driven culture.
JT: Was there kind of a painful transition that some sites have from being kind of the garage band site to kind of growing up? Or was there always a culture of corporateness to the environment?
JA: Well, I would say that the culture here is still very — it’s still very open. I mean the company’s been running for two-and-a-half years and it’s very much a start-up environment. We have lunch everyday together. It’s a very fun office environment. As far as the corporate nature, I think what you might be referring to is the formalization of engineering requirements, that going from this sort of Wild West changing things everyday to going to a more disciplined culture around deploying software and testing. And that’s definitely true. The process that we have around deploys and the process that we have around software development has become much more structured, and that’s definitely helped out with site stability. Now when we deploy software, we are very, very cognizant of, “Did we increase the error rate? Have we made the site worse? Have we changed the response time?” And those metrics are very important to us.
JT: I assume you have some kind of some kind of sandboxing or staging where you can apply load to a new release before it goes live?
JA: Certainly. We have staging environments. We have different admin environments. We have a lot of isolation before things go live. And, in fact, many developers develop locally on their computers. We have a lot of Macs here. You may have seen the recent Apple business profile of us. But we are a heavy Mac shop. And the developers can run their entire environment, all of Twitter, on a single machine if they need to for testing.
JT: How is Twitter different from a traditional Web 2.0 site in terms of its needs?
JA: Do you mean hardware, software, or just in general?
JT: How you go about thinking about it?
JA: I think it’s different because the majority of our traffic takes place over the API and not over the web. And that means that if we have issues that are on the website as opposed to issues that users would see in a client, it’s a very different load profile. Many people communicate with us through SMS, for example, instead of the website. And most Web 2.0 companies are very concerned with the look and feel of the website and how things perform. And we’re more of a conduit.
JT: Twitter certainly does have a certain minimalism to it, a very Google-esque minimalism to the design.
JA: Yeah, I would say that.
JT: So, as you just mentioned, Twitter does live as part of a larger ecosystem of social networking sites and applications. They all interconnect to some extent. A lot of your traffic does come out of iPhone apps and cell phone apps and widgets on desktops. How conscious do you have to be of how third parties are consuming your APIs?
JA: Well, I mean we definitely have terms of service that third parties have to follow. We’re very open to abuse because of that. We’re frequently disabling accounts because of spam and people abusing the API. That changes the way you run the site. It changes the way you run security. And abuse definitely impacts performance. The more abuse we have, the less legitimate traffic can use the site.
JT: To what extent do you find yourself almost in a partnership arrangement where, obviously, the APIs are mainly of benefit to the people consuming them as application developers, how much of a cooperative environment do you have there?
JA: Well, we communicate with API consumers every day on the development mailing list. That’s mostly run by Alex Payne and Doug. They handle our API support. We have a website that gets updated every day with different tips on how to use our API. So it is — it’s a cooperative arrangement but a very open arrangement between the developers and Twitter.
JT: As Twitter has evolved, the use cases have changed somewhat. You see people using a lot more of the searching and tagging features now, as people have kind of evolved how they use it. How much does that change the load and nature of the site? And how hard is it to stay ahead of that?
JA: Well, search is a good example. It’s very well-known that when we released search to the public, we released it in very small batches initially. So you can see — a lot of people said, “Oh, I have the search box on my page.” And many people said, “Well, I don’t have the search box.” So we do bucket testing to make sure that the site is going to be able to take the load. And then we release a feature. At least that’s how that feature went out.
JT: Can you give us some feel for what you’re going to be talking about at Velocity, what people can look forward to?
JA: Yeah. The talk at Velocity, I want to spend some time focusing on the way that process and metrics allow people to really understand how their website works and being able to interpret those metrics and use those for scaling. You know, there are limits to what we can talk to about the way that Twitter works internally, mainly because of NDA issues and whatnot. But I want to be able to take the lessons from here and put them in a format where they can be applied to anyone’s site. And I think one of the things that we like to show people at Velocity is that there is a different way of looking at performance problems. We encourage getting people into a metrics-driven culture. And small companies and small web shops may not realize that at first of how important it is to use numbers to make decisions. So we’ll cover that. We’ll also cover some best practices for Ruby on Rails deployments and what the importance of doing things asynchronously as opposed to synchronously, like when you’re sending mail to a user, how those processes should be done outside of the scope of the web request. Some things like that.
JT: Sounds good. So to put you on the spot, when you tweet, what are you tweeting with?
JA: If I’m sitting at my desk, I’m using right now Tweetie, which is really great. I used to use Twitterific, but now I use Tweetie. And on my iPhone, I also use Tweetie. So I switched over from Twitterific to Tweetie recently because it has a better interface to our API. But it has support for search trends. But, honestly, there are many, many good clients out there. And I think that it’s good to go on the iPhone App store or go on websites and find one that you really like because there are a lot of features that are different between the interfaces.
JT: I have to say I’ve become a Tweetdeck fan myself recently.
JA: Yeah. That’s a great product, too. I think it gives you a really broad scope of what’s happening on Twitter all at once.
JT: We’ve been talking to John Adams who works in Twitter Operations. He’ll be speaking at the Velocity Conference on Fixing Twitter: Improving the Performance and Scalability of the World’s Most Popular Microblogging Site. Thank you for taking time to talk to us.
JA: No problem.