"webops" entries

Velocity Preview – Keeping Twitter Tweeting

If there’s a site that exemplifies explosive growth, it has to be Twitter. It seems like everywhere you look, someone is Tweeting, or talking about Tweeting, or Tweeting about Tweeting. Keeping the site responsive under that type of increase is no easy job, but it’s one that John Adams has to deal with every day, working in Twitter Operations. He’ll be talking about that work at O’Reilly’s Velocity Conference, in a session entitled Fixing Twitter: Improving the Performance and Scalability of the World’s Most Popular Micro-blogging Site, and he spent some time with us to talk about what is involved in keeping the site alive.

AT&T Fiber cuts remind us: Location is a Basket too!

The fiber cuts affecting much of the San Francisco Bay Area this week are similar to the outages in the Middle East last year (radar post), although far more limited in scope and impact.   What I said last year still holds true and is repeated below: From an operations perspective these kinds of outages are nothing new, and underscore why…

Understanding Web Operations Culture – the Graph & Data Obsession

We’re quite addicted to data pr0n here at Flickr. We’ve got graphs for pretty much everything, and add graphs all of the time. -John Allspaw, Operations Engineering Manager at Flickr & author of The Art of Capacity Planning One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic…

Velocity 2009: Themes, ideas, and call for participation…

Last year's Velocity conference was an incredible success. We expected around 400 people and we ended up maxing out the facility with over 600. This year we're moving the conference to a bigger space and extending it to 3 days to accommodate workshops and longer sessions. Velocity 2009 will be on June 22-24th, 2009 at the Fairmont Hotel in San…

DisasterTech: "Decisions for Heroes"

One of the most interesting DisasterTech projects I’ve been following is “Decisions for Heroes” led by developer and Irish Coast Guard volunteer Robin Blandford. Decisions is like Basecamp for volunteer Search & Rescue teams. The focus is on providing “just enough” process to compliment the real-world workflow of a rescue team, without unnecessary complexity. One of Robin’s design goals is…

Sprint blocking Cogent network traffic…

It appears that Sprint has stopped routing traffic (called “depeering”) from Cogent as a result of some sort of legal dispute. Sprint customers cannot reach Cogent customers, and vice versa. The effect is similar to what would happen if Sprint were to block voice phonecalls to AT&T customers. Here’s a graph that shows the outage, courtesy of Keynote : Rich…

Amazon's new EC2 SLA

Amazon announced a new SLA for EC2, similar to the one for S3. This is a notable step for Amazon and cloud computing as a whole, as it establishes a new bar for utility computing services. Amazon is committing to 99.95% availability for the EC2 service on a yearly basis, which corresponds to approximately four hours and twenty three minutes…

Hyperic CloudStatus service dashboard launches at Velocity!

Javier Soltero just launched CloudStatus during his Hyperic sponsor session today at Velocity. CloudStatus is a public health dashboard for web services like Amazon's EC2/S3, and Google's App Engine. Javier called to tell me about this last week after I declared that "Service Monitoring Dashboards are mandatory". This comes right after Amazon and Google had visible outages, and couldn't have…

Service Monitoring Dashboards are mandatory for production services!

Google App Engine went down earlier today. GAE is still a developer preview release, and currently lacks a public monitoring dashboard. Unfortunately this means that many people either found out from their app and/or admin consoles being unavailable or from Mike Arrington's post on TechCrunch. Google has a strong Web Operations culture, and there are numerous internal monitoring tools in…

Two new open source projects at Velocity

At Velocity next week there will be two significant open source projects debuting. The first is the Jiffy: Open Source Performance Measurement and Instrumentation tool created by Scott Ruthfield and his team at Whitepages.com. Most tools for measuring web performance come in two flavors: Developer-installed tools (Firebug, Fiddler, etc.) that allow individuals to closely trace single sessions Third-party performance monitoring…