Astrid Atkinson talks about what it means to manage distributed systems over long periods of time. She says she starts by thinking about the team, not the system. “Good teams and people are precious...building a good team is really difficult, a huge investment,” she notes. “Your job as an engineer is to make sure that adding scale doesn’t mean adding people.”
The O'Reilly Radar Podcast: Astrid Atkinson on optimization, and Kelsey Hightower on distributed computing.
Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.
In this week’s episode, O’Reilly’s Mac Slocum talks to Astrid Atkinson, director of software engineering at Google, about the delicate balance of managing complexity in distributed systems and her experience working on-call rotations at Google.
Here are a few snippets from their chat:
I think it’s often really hard for organizations that are scaling quickly to find time to manage complexity in their systems. That can be really a trap, because if you’re really always just focused on the next deadline or whatever, and never planning for what you’re going to live with when you’re done, then you might never find the time.
You can only optimize what you pay attention to, and so if you can’t see what your system is doing, if you can’t see whether it’s working, it’s not working.
I used to get paged awake at two in the morning. You go from zero to Google is down. That’s a lot to wake up to.
Mapping the future of development by designing for distributed architectures.
With the advent of DevOps and various Platform-as-a-Service (PaaS) environments, many complex business requirements need to be met within a much shorter timeframe. The Internet of Things (IoT) is also changing how established applications and infrastructures are constructed. As a result of these converging trends, the enterprise IT landscape is becoming increasingly distributed, and the industry is starting to map how all the various components — from networking and middleware platforms, to ERP systems and microservices — will come together to create a new development paradigm that exists solely in the cloud.
A deep integration across design, development, and operations is critical to digital business success.
I just finished reading Thomas Wendt’s wonderful book, Design for Dasein. I recommend it to anyone who practices, or just is interested in, experience design. Wendt’s ideas have profound implications for rethinking and improving our approach to designing experiences. They also have profound implications for how we think about DevOps, and its relationship to design, and how that relationship impacts the nature and purpose of digital business.
Design for Dasein introduces what Wendt calls “phenomenological design thinking.” This is a new approach to design that expands the designer’s attention beyond creating things that people use, to encompass thinking about the ways in which things influence, interact with, and are influenced by how people experience the world. Phenomenological design thinking reflects two key insights about the role of designed objects in peoples’ lives. First, designers create possibilities for use rather than rigid solutions. Wendt cites the example of using an empty coke bottle to hold open a door in an old, crooked apartment. By itself, the bottle wasn’t heavy enough to keep the door from swinging shut, so he filled it with pennies. At that point, the bottle suddenly had three overlapping uses: containing and drinking soda, holding opening one’s bedroom door, and storing spare change. Wendt’s point is that the designer does not entirely control the object’s destiny. That destiny is co-created by the designer and the user.
Moving beyond ad-hoc automation to take advantage of patterns that deliver predictable capabilities.
Can you release new features to your customers every week? Every day? Every hour? Do new developers deploy code on their first day, or even during job interviews? Can you sleep soundly after a new hire’s deployment knowing your applications are all running perfectly fine? A rapid release cadence with the processes, tools, and culture that support the safe and reliable operation of cloud-native applications has become the key strategic factor for software-driven organizations who are shipping software faster with reduced risk. When you are able to release software more rapidly, you get a tighter feedback loop that allows you to respond more effectively to the needs of customers.
Continuous delivery is why software is becoming cloud-native: shipping software faster to reduce the time of your feedback loop. DevOps is how we approach the cultural and technical changes required to fully implement a cloud-native strategy. Microservices is the software architecture pattern used most successfully to expand your development and delivery operations and avoid slow, risky, monolithic deployment strategies. It’s difficult to succeed, for example, with a microservices strategy when you haven’t established a “fail fast” and “automate first” DevOps culture.
Continuous delivery, DevOps, and microservices describe the why, how, and what of being cloud-native. These competitive advantages are quickly becoming the ante to play the software game. In the most advanced expression of these concepts they are intertwined to the point of being inseparable. This is what it means to be cloud-native.
Designing, building, and operating services from the perspective of customer goals helps improve quality.
We often tend to think about “usability” as applying to a separate layer of digital service from functionality or operability. We treat it as a characteristic of an interface which intermediates between the user and an application’s utility. Operational concerns such as performance, resilience, or security are even further removed. This approach gets reflected in siloed design-development-operations practices. From the perspective of service quality, though, I think it may be more constructive to view usability as a characteristic of service as a whole.
What is service, anyway? In the language of service-dominant logic, it’s something that helps a customer accomplish a job-to-be-done. From that perspective, usability refers to the customer’s ability to ‘use’ the service to accomplish their goals. Everything that contributes to, or compromises, that ability, impacts usability.
How orchestration differs from automation in the enterprise cloud.
The orchestration of workflow processes is an essential part of cloud computing. Without orchestration, many of the benefits and characteristics of cloud computing cannot be achieved at the price point that cloud services should be offered. Failure to automate as many processes as possible results in higher personnel labor costs, slower time to deliver the new service to customers, and ultimately higher cost with less reliability.
What is meant by automation? Automation is technique used in traditional data centers —and critical in a cloud environment — to install software or initiate other activities. Traditional IT administrators use sequential scripts to perform a series of tasks (e.g. software installation or configuration); however, this is now considered an antiquated technique in a modern cloud-based environment. Orchestration differs from automation in that it does not rely entirely on static sequential scripts but rather sophisticated workflows; multiple automated threads; query-based and if/then logic; object-oriented and topology workflows; and even the ability to back-out a series of automated commands if necessary.
Orchestration can best be explained through a typical use case example of a customer placing an order within their cloud service web-portal, and following the steps necessary to bring the service online. The actions below illustrate a very high level scenario where the cloud management software performs the orchestration: