WebOps Hack #1: Simple Availability Report for Busy Teams

I created this spreadsheet for tracking availability and “days since last outage”.Simple-Availability-Report-2 Along with the availability and uptime calculations, it asks the following questions:

  • What broke?
  • Why?
  • What fixed It?
  • What did we learn?
  • How can we prevent recurrence?
  • Who owns follow-up?

I’ve found this to be the “simplest thing that could possibly work” for identifying problems and tracking issues before a formal incident tracking system is in place, or with vendors or other teams who you want to keep honest. Please let me know if it’s helpful for you and how it might be improved. (Feel free to improve upon it yourself too — it’s Creative Commons Attribution Share Alike.)

Link to the Google doc is here. You need to “Copy to a new spreadsheet” to be able to use it.

Technorati Tags: , , ,

tags: , , , ,

Get the O’Reilly Web Ops and Performance Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.