This is part of the Velocity Profiles series, which highlights the work and knowledge of web ops and performance experts.
How did you get into web operations and performance?
Picnik’s founders Mike Harrington and Darrin Massena needed someone who knew something about Linux. Darrin and I had known each other for a few years, so my name came up. At the time, I was doing embedded systems work, but ended up moonlighting for Picnik. It wasn’t long before I came over full time. I always expected to help them get off the ground and then they’d find a “real sysadmin” to take over. Turns out, I ended up enjoying ops! I was lucky enough to straddle the world
between ops and back-end dev. Sound familiar?
What is your most memorable project?
Completing a tight database upgrade at a Starbucks mid-way between Seattle and
Portland. “Replicate faster, PLEASE!” Also, in the build-up to
Picnik’s acquisition by Google, Mike asked me what it would take to
handle 10 times our current traffic and to do it in 30 days. We doubled
Picnik’s hardware, including a complete network overhaul. It went
flawlessly and continued to serve Picnik until Google shut it down in
April of this year.
What’s the toughest problem you’ve had to solve?
When Flickr launched with Picnik as its photo editor, we started to see really weird behavior causing some Flickr API calls to hang. I spent a good chunk of that day on the phone with John Allspaw and finally identified an issue with how our NAT box was munging TCP timestamps that were interacting badly with Flickr’s servers. I learned a couple things: First, both John and I were able to gather highly detailed info (tcpdumps) at key points in our networks (and hosts) — sometimes you just have to go deep; second, it’s absolutely imperative that you have good technical contacts with your partners.
What tools and techniques do you rely on most?
Graphs and monitoring are critical. Vim, because I can’t figure out Emacs. Automation, because I can’t even remember what I had for
Who do you follow in the web operations and performance world?
What is your web operations and performance super power?
I think I’m good at building, maintaining, and understanding complete systems. Other engineering disciplines are typically concerned about the details of a single part of a larger system. As web engineers, we have to grok the system, the components, and their interactions … at 2 AM.