"learning review" entries

Dave Zwieback on learning reviews and humans keeping pace with complex systems

O'Reilly Radar Podcast: Learning from both failure and success to make our systems more resilient.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

350px-SrisailamDam01-India

In this week’s Radar Podcast episode, I chat with Dave Zwieback, head of engineering at Next Big Sound and CTO of Lotus Outreach. Zwieback is the author of a new book, Beyond Blame: Learning from Failure and Success, that outlines an approach to make postmortems not only blameless, but to turn them into a productive learning process. We talk about his book, the framework for conducting a “learning review,” and how humans can keep pace with the growing complexity of the systems we’re building.

When you add scale to anything, it becomes sort of its own problem. Meaning, let’s say you have a single computer, right? The mean time to failure of the hard drive or the computer is actually fairly lengthy. When you have 10,000 of them or 10 million of them, you’re having tens if not hundreds of failures every single day. That certainly changes how you go about designing systems. Again, whenever I say systems, I also mean organizations. To me, they’re not really separate.

I spent a bunch of my time in fairly large-scale organizations, and I’ve witnessed and been part of a significant number of outages or issues. I’ve seen how dysfunctional organizations dealing with failure can be. By the way, when we mention failure, it’s important for us not to forget about success. All the things that we find in the default ways that people and organizations deal with failure, we find in the default ways that they deal with success. It’s just a mirror image of each other.

We can learn from both failures and success. If we’re only learning from failures, which is what the current practice of postmortem is focused on, then we’re missing … the other 99% of the time when they’re not failing. The practice of learning reviews allows for learning from both failures and successes.

Read more…