Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.
In this week’s episode, O’Reilly’s Mac Slocum talks to Astrid Atkinson, director of software engineering at Google, about the delicate balance of managing complexity in distributed systems and her experience working on-call rotations at Google.
Here are a few snippets from their chat:
I think it’s often really hard for organizations that are scaling quickly to find time to manage complexity in their systems. That can be really a trap, because if you’re really always just focused on the next deadline or whatever, and never planning for what you’re going to live with when you’re done, then you might never find the time.
You can only optimize what you pay attention to, and so if you can’t see what your system is doing, if you can’t see whether it’s working, it’s not working.
I used to get paged awake at two in the morning. You go from zero to Google is down. That’s a lot to wake up to.
The compute management space is really interesting because it’s sort of one of the fundamental shifts in the way that computing works — when you get away from managing an OS, managing machine, managing a BIOS, managing the hardware, to where it’s just managing the thing that you want to do with it. It’s profoundly empowering.
Also in this podcast…
In the second segment, Slocum talks to Kelsey Hightower, developer advocate & toolsmith at CoreOS. Hightower talks about the state of distributed computing and managing programming language dependencies.
Public domain image on article and category pages via Wikimedia Commons.