"webops" entries

Velocity Preview – Keeping Twitter Tweeting

If there’s a site that exemplifies explosive growth, it has to be Twitter. It seems like everywhere you look, someone is Tweeting, or talking about Tweeting, or Tweeting about Tweeting. Keeping the site responsive under that type of increase is no easy job, but it’s one that John Adams has to deal with every day, working in Twitter Operations. He’ll be talking about that work at O’Reilly’s Velocity Conference, in a session entitled Fixing Twitter: Improving the Performance and Scalability of the World’s Most Popular Micro-blogging Site, and he spent some time with us to talk about what is involved in keeping the site alive.

AT&T Fiber cuts remind us: Location is a Basket too!

by Jesse Robbins | @jesserobbins | April 10, 2009

The fiber cuts affecting much of the San Francisco Bay Area this week are similar to the outages in the Middle East last year (radar post), although far more limited in scope and impact. What I said last year still holds true and is repeated below: From an operations perspective these kinds of outages are nothing new, and underscore why…

Understanding Web Operations Culture – the Graph & Data Obsession

by Jesse Robbins | @jesserobbins | February 5, 2009

We’re quite addicted to data pr0n here at Flickr. We’ve got graphs for pretty much everything, and add graphs all of the time. -John Allspaw, Operations Engineering Manager at Flickr & author of The Art of Capacity Planning One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic…

Velocity 2009: Themes, ideas, and call for participation…

by Jesse Robbins | @jesserobbins | November 20, 2008

Last year's Velocity conference was an incredible success. We expected around 400 people and we ended up maxing out the facility with over 600. This year we're moving the conference to a bigger space and extending it to 3 days to accommodate workshops and longer sessions. Velocity 2009 will be on June 22-24th, 2009 at the Fairmont Hotel in San…

Sprint blocking Cogent network traffic…

by Jesse Robbins | @jesserobbins | October 31, 2008

It appears that Sprint has stopped routing traffic (called “depeering”) from Cogent as a result of some sort of legal dispute. Sprint customers cannot reach Cogent customers, and vice versa. The effect is similar to what would happen if Sprint were to block voice phonecalls to AT&T customers. Here’s a graph that shows the outage, courtesy of Keynote : Rich…

Hyperic CloudStatus service dashboard launches at Velocity!

by Jesse Robbins | @jesserobbins | June 23, 2008

Javier Soltero just launched CloudStatus during his Hyperic sponsor session today at Velocity. CloudStatus is a public health dashboard for web services like Amazon's EC2/S3, and Google's App Engine. Javier called to tell me about this last week after I declared that "Service Monitoring Dashboards are mandatory". This comes right after Amazon and Google had visible outages, and couldn't have…

Service Monitoring Dashboards are mandatory for production services!

by Jesse Robbins | @jesserobbins | June 17, 2008

Google App Engine went down earlier today. GAE is still a developer preview release, and currently lacks a public monitoring dashboard. Unfortunately this means that many people either found out from their app and/or admin consoles being unavailable or from Mike Arrington's post on TechCrunch. Google has a strong Web Operations culture, and there are numerous internal monitoring tools in…

Two new open source projects at Velocity

by Jesse Robbins | @jesserobbins | June 17, 2008

At Velocity next week there will be two significant open source projects debuting. The first is the Jiffy: Open Source Performance Measurement and Instrumentation tool created by Scott Ruthfield and his team at Whitepages.com. Most tools for measuring web performance come in two flavors: Developer-installed tools (Firebug, Fiddler, etc.) that allow individuals to closely trace single sessions Third-party performance monitoring…