Postmortems, sans finger-pointing: The O’Reilly Radar Podcast

In this episode, John Allspaw talks in-depth about blameless postmortems and creating a just culture.

Editor’s note: you can subscribe to the O’Reilly Radar Podcast through iTunes, SoundCloud, or directly through our podcast’s RSS feed.

When you’re dealing with complex systems, failure is going to happen; it’s a given. What we do after that failure, however, strongly influences whether or not that failure will happen again. The traditional response to failure is to seek out the person responsible and punish them accordingly — should they be fired? Retrained? Moved to a different position where they can’t cause such havoc again?

John Allspaw, SVP of technical operations at Etsy and co-chair of the O’Reilly Velocity Conference, argues that this “human error” approach is the equivalent of cutting off your nose to spite your face. He explains in a blog post that at Etsy, their approach it to “view mistakes, errors, slips, lapses, etc., with a perspective of learning.” To that end, Etsy practices “blameless postmortems” that focus more on the narrative of how something happened rather than who was behind it, and that remove punishment as an outcome of an investigation.

Read more…

Comment: 1
Four short links: 14 August 2014

Four short links: 14 August 2014

Ceramic 3D Printing, Robo Proofs, Microservice Fail, and Amazing Graphics Tweaks

  1. $700 Ceramic-Spitting 3D Printer (Make Magazine) — ceramic printing is super interesting, not least because it doesn’t fill the world with plastic glitchy bobbleheads.
  2. Mathematics in the Age of the Turing Machine (Arxiv) — a survey of mathematical proofs that rely on computer calculations and formal proofs. (via Victoria Stodden)
  3. Failing at Microservices — deconstructed a failed stab at microservices. Category three engineers also presented a significant problem to our implementation. In many cases, these engineers implemented services incorrectly; in one example, an engineer had literally wrapped and hosted one microservice within another because he didn’t understand how the services were supposed to communicate if they were in separate processes (or on separate machines). These engineers also had a tough time understanding how services should be tested, deployed, and monitored because they were so used to the traditional “throw the service over the fence”to an admin approach to deployment. This basically lead to huge amounts of churn and loss of productivity.
  4. Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes — computer vision doing more amazing things: annotate scenes (e.g., sunsets, seasons), train, then be able to adjust images. Tweak how much sunset there is in your pic? Wow.

Velocity highlights (video bonus!)

A collection of must-see keynotes from Velocity Santa Clara, with bonus videos of some of the best sessions.

Editor’s note: this post originally appeared on Steve Souders’ blog; it is published here with permission.

We’re in the quiet period between Velocity Santa Clara and Velocity New York. It’s a good time to look back at what we saw and look forward to what we’ll see this September 15-17 in NYC.

Velocity Santa Clara was our biggest show to date. There was more activity across the attendees, exhibitors, and sponsors than I’d experienced at any previous Velocity. A primary measure of Velocity is the quality of the speakers. As always, the keynotes were livestreamed — the people who tuned in were not disappointed. I recommend reviewing all of the keynotes from the Velocity YouTube Playlist. All of them were great, but here’s a collection of some of my favorites.

Virtual Machines, JavaScript and Assembler

Start. Here. Scott Hanselman’s walk through the evolution of the web and cloud computing is informative and hilarious:

Read more…


Four short links: 4 July 2014

Deleted Transparency, Retro Theme, MPA Suckage, and Ultrasonic Comms

  1. The Flipside of the Right To Be Forgotten (Business Insider) — deletion requests were granted for a former politician who wanted to remove links to a news article about his behavior when previously in office – so that he can have a clean slate when running for a new position – and a man who was convicted of possessing child sexual abuse imagery.
  2. BOOTSTRA.386 — gorgeously retro theme for Bootstrap.
  3. Multi-Process Architectures Suck — detailed and painful look at the computational complexity and costs of multiprocess architectures.
  4. Chromecast Ultrasonic CommsIn the new system, Chromecast owners first allow support for nearby devices. A nearby device then requests access to the Chromecast, and the Chromecast plays an ultrasonic sound through the connected TV’s speakers. The sound is then picked up by the microphone in the device, which allows it to pair with the TV. (via Greg Linden)

Revisiting “What is DevOps”

If all companies are software companies, then all companies must learn to manage their online operations.


Two years ago, I wrote What is DevOps. Although that article was good for its time, our understanding of organizational behavior, and its relationship to the operation of complex systems, has grown.

A few themes have become apparent in the two years since that last article. They were latent in that article, I think, but now we’re in a position to call them out explicitly. It’s always easy to think of DevOps (or of any software industry paradigm) in terms of the tools you use; in particular, it’s very easy to think that if you use Chef or Puppet for automated configuration, Jenkins for continuous integration, and some cloud provider for on-demand server power, that you’re doing DevOps. But DevOps isn’t about tools; it’s about culture, and it extends far beyond the cubicles of developers and operators. As Jeff Sussna says in Empathy: The Essence of DevOps:

…it’s not about making developers and sysadmins report to the same VP. It’s not about automating all your configuration procedures. It’s not about tipping up a Jenkins server, or running your applications in the cloud, or releasing your code on Github. It’s not even about letting your developers deploy their code to a PaaS. The true essence of DevOps is empathy.

Read more…

Comments: 4
Four short links: 30 June 2014

Four short links: 30 June 2014

Interacting with Connected Objects, Continuous Security Review, Chess AI, and Scott Hanselman is Hilarious

  1. Interacting with a World of Connected Objects (Tom Coates) — notes from one of my favourite Foo Camp sessions.
  2. Security Considerations with Continuous Deployment (IBM) — rundown of categories of security issues your org might face, and how to tackle them in the continuous deployment cycle. (via Emma Jane Westby)
  3. The Chess Master and the Computer (Garry Kasparov) — Increasingly, a move isn’t good or bad because it looks that way or because it hasn’t been done that way before. It’s simply good if it works and bad if it doesn’t. Although we still require a strong measure of intuition and logic to play well, humans today are starting to play more like computers. (via Alexis Madrigal)
  4. Virtual Machines, Javascript, and Assembler (YouTube) — hilarious Velocity keynote by Scott Hanselman.
Comment: 1
Four short links: 27 June 2014

Four short links: 27 June 2014

Google MillWheel, 20yo Bug, Fast Real-Time Visualizations, and Google's Speed King

  1. MillWheel: Fault-Tolerant Stream Processing at Internet Scale — Google Research paper on the tech underlying the new cloud DataFlow tool. Watch the video. Yow.
  2. The Integer Overflow Bug That Went to Mars — long-standing (20 year old!) bug in a compression library prompts a wave of new releases. No word yet on whether NASA will upgrade the rover to avoid being pwned by Martian script kiddies. (update: I fell for a self-promoter. The Martians will need to find another attack vector. Huzzah!)
  3. epoch (github) — Fastly-produced open source general purpose real-time charting library for building beautiful, smooth, and high performance visualizations.
  4. Achieving Rapid Response Times in Large Online Services (YouTube) — Jeff Dean‘s keynote at Velocity. He wrote … a lot of things for this. And now he’s into deep learning ….

Four short links: 26 June 2014

IoT Future, Latency Numbers, Mobile Performance, and Minimum Viable Bureaucracy

  1. Charlie Stross on 2034every object in the real world is going to be providing a constant stream of metadata about its environment — and I mean every object. The frameworks used for channeling this firehose of environment data are going to be insecure and ramshackle, with foundations built on decades-old design errors. (via BoingBoing)
  2. Latency Numbers Every Programmer Should Know — awesome animation so you can see how important “constants” which drive design decisions have changed over time.
  3. Extreme Web Performance for Mobile Devices (Slideshare) — notes from Maximiliano Firtman’s Velocity tutorial.
  4. Minimum Viable Bureaucracy (Laura Thomson) — notes from her Velocity talk. A portion of engineer’s time must be spent on what engineer thinks is important. It may be 100%. It may be 60%, 40%, 20%. But it should never be zero.
Four short links: 20 June 2014

Four short links: 20 June 2014

Available Data, Goal Setting, Real Tech, and Gamification Numbers

  1. Dynamo and BigTable — good preso overview of two approaches to solving availability and consistency in the event of server failure or network partition.
  2. Goals Gone Wild (PDF) — In this article, we argue that the beneficial effects of goal setting have been overstated and that systematic harm caused by goal setting has been largely ignored. We identify specific side effects associated with goal setting, including a narrow focus that neglects non-goal areas, a rise in unethical behavior, distorted risk preferences, corrosion of organizational culture, and reduced intrinsic motivation.
  3. Tech Isn’t All Brogrammers (Alexis Madrigal) — a reminder that there are real scientists and engineers in Silicon Valley working on problems considerably harder than selling ads and delivering pet food to one another. (via Brian Behlendorf)
  4. Numbers from 90+ Gamification Case Studies — cherry-picked anecdata for your business cases.

From the network interface to the database

All systems are distributed systems, and we’re starting to see how they fit into Velocity's themes.


From the beginning, the Velocity Conference has focused on web performance and operations — specifically, web operations. This focus has been fairly narrow: browser performance dominated the discussion of “web performance,” and interactions between developers and IT staff dominated operations.

These limits weren’t bad. Perceived performance really is dominated by the browser — how fast you can get resources (HTML, images, CSS files, JavaScript libraries) over the network to the browser, and how fast the browser can execute those resources. How long before a user stops waiting for your page to load and clicks away? How do you make a page useable as quickly as possible, even before all the resources have loaded? Those discussions were groundbreaking and surprising: users are incredibly sensitive to page speed.

That’s not to say that Velocity hasn’t looked at the rest of the application stack; there’s been an occasional glance in the direction of the database and an even more occasional glance at the middleware. But the database and middleware have, at least historically, played a bit part. And while the focus of Velocity has been front-end tuning, speakers like Baron Schwartz haven’t let us ignore the database entirely. Read more…

Comment: 1