Amazon's new EC2 SLA

Amazon announced a new SLA for EC2, similar to the one for S3. This is a notable step for Amazon and cloud computing as a whole, as it establishes a new bar for utility computing services. Amazon is committing to 99.95% availability for the EC2 service on a yearly basis, which corresponds to approximately four hours and twenty three minutes…

Read Full Post | Comments: 7 |

You Become what You Disrupt – (part two)

Google's GrandCentral (Radar coverage) was down over the weekend resulting in missed calls and other phone problems for its users. This is very similar to the the two day Skype outage last year where I said that "You Become what You Disrupt". I've spoken about this issue several times, most recently at the Princeton CITP "Computing in the Cloud" workshop….

Read Full Post | Comments: 10 |

Paging systems and Conference Bridges for startups & small teams

Early registration for the Velocity Web Performance & Operations Conference has opened. To help spread the word, I’ve written this “simplest thing that will work” hack to a common Operations need: Paging systems and Conference Bridges. Step 1: Establish a team contact list with SMS email addresses Create a Google Spreadsheet to create a team roster like this one. My…

Read Full Post | Comments: 12 |

WebOps Hack #1: Simple Availability Report for Busy Teams

I created this spreadsheet for tracking availability and "days since last outage". Along with the availability and uptime calculations, it asks the following questions: What broke? Why? What fixed It? What did we learn? How can we prevent recurrence? Who owns follow-up? I've found this to be the "simplest thing that could possibly work" for identifying problems and tracking issues…

Read Full Post | Comments: 2 |