Efficient, Effective Communication Still Often Elusive

In the operational environment, miscommunication can be costly; but there are some easy ways to improve it.

Editor’s note: This is part two in a four-part series on the “-ations” of aviation that can provide further insight into DevOps best practices and achieving them. Part one, on how standardization helps organizations scale and is actually a part of healthy DevOps culture, can be read here.

Communication is an enigmatic topic when it comes to engineering. Parts of our jobs—blueprints, chemical formulae, and source code—require extremely precise forms of communication (even if it doesn’t end up communicating to the steel, molecules, or silicon what we intended). But when it comes to email threads sifting through requirements, meetings about implementation styles and risk assessment, and software design documentation, we often fumble.

Let’s face it: there’s a reason the “engineer equals bad communicator” stereotype exists. But there are some simple things that can be done, both individually and technologically, to begin challenging that stereotype.

Dual Navigation Receivers Required

There are obviously many forms of communication. In an operational context, it’s useful to distinguish between static and active communication.

In aviation, static communication is what we see in notices-to-airmen (NOTAMs), flight plan details, weather reports, checklists, and other general data indirectly related to a specific flight. It is optimized for mass consumption, by users as varied as pilots studying weather at their destination to the numerous people (controllers and airline dispatchers)involved in shepherding a flight on its way. As such, it needs to fulfill diverse needs and the structure, formatting, and flow of the information in the document is very important. The image below is an example of flight control strips, which air traffic controllers still use.

FlightProgressStrips-small
Flight control strips like these are an example of static communication: they give all the necessary information to track and hand-off aircraft to other controllers, in an encoded, standardized format.

Active communication, as the name implies, is “in motion”: radio conversations between pilots and controllers or crew discussions in the cockpit. Active communication is tied to the execution of specific tasks and, therefore, has timeliness requirements which may not be present in its static counterpart. Here’s an example of active communication, orchestrated by San Francisco Tower during an afternoon rush; all aviation communication requires efficiency and clarity, but since all of SFO’s intersect, the need is especially palpable.

When we optimize for operations, both forms have their own “dictionary,” with standard phraseology. The active form also has a requirement that speaker and listener be able to signal that they are in an “exceptional state,” and that dictionary may go out the window. Pilots declaring an emergency, for example, serves this purpose: while the communication remains terse, focused, and as efficient as possible, the conditions may require discussion of non-standard behavior: you never know what’s going to come out of a pilot’s mouth during an emergency when she’s asked “Say intentions.”

How does this apply to writing and shipping bits? Let’s take a look at static communication first.

NAV ILS RWY 28L UNREL BROADCASTING MISLEADING INFORMATION

There exist entire books and PhD theses devoted to writing software design documents. And given everyone I know hates doing it, most organizations skimp on it, and what is produced is often less-than-useful, I’ll go out on a limb and say it continues to be an unsolved problem.

But we don’t have to require NASA-esque documentation; there exist some simple ways to improve an organization’s static communication artifacts:

Source code style guide
Do you have one? Most places do. But does anyone follow it? Style guides are often contentious. But their value is more than just “tabs or spaces?” (which sometimes actually matters!) In languages like C++ or even Javascript, having a “language lawyer”—an engineer on the team who’s responsible for knowing the language’s dusty corners—craft style rules can help the entire team avoid subtle bugs. Teams that aren’t just paying lip service to their style guides add these rules to the unit test suite.

Formatting for commit logs and other artifacts
Have you defined commit message, bug summary, and checklist formats? Are commit messages validated by the version control system? Does on-boarding training include bug etiquette? Commit messages and bug reports, in particular, are the lifeblood of the software development process. We’ve all had the experience of trying to clarify the context of a bug in our own heads, only to have to sift through commit messages like “Initial commit of the foo feature,” “Fix unit tests”, and the omnipresent “LOL Oops!” (Admittedly, git’s ability to squash and reorder commits can mitigate this; but I’m surprised at how I continue to see messages like this in the “official” histories of production code.) 
We might think twice about getting on a plane in a thunderstorm if NOTAMs were full of “LOL ILS borked; dunno why!” and flight plans were littered with pilots swearing about deviations, as we should. These comments are hilarious until you’re the one up at 3 am, stuck, pulling hair out, trying to piece together a bug’s provenance.

Telling a build’s life story
Does your software, just by running it, provide enough identifying information to determine details like what time the build occurred? Can you use that information to retrieve a complete build log? A list of commits? The bug numbers fixed? Can you link it back to the build’s test results? I call this the “chain of evidence”: every build should have some sort of unique identifier that teams can easily convert between bug lists, commit lists, test results, debugging symbols, build logs, and related (third-party) artifacts. And if you can, can you do it for a build you shipped nine months ago, or did it get blown away last time the Jenkins server’s drive filled up? 
Linking this data together is the lowest-hanging fruit when it comes to improving communication; after all, most of it is generated automatically with every build! And yet many (even mature!) organizations don’t have it available, much less linked, for their important builds.

“Read-back correct; contact Ground for taxi”

To say that active communication is “more complex” than the static form would be an understatement. But there are some simple lessons we can steal from aviation and apply to the software operational environment.

Resolve ambiguous language
This may seem like a bit of a no-brainer, but the communication often breaks down when teams don’t actively agree to use the same terminology. Many of you have probably lived the problem where the Ops team calls a component something different than what developers call it; DevOps collaboration may uncover this piecemeal, but teams that insist on continuing to use mismatched terminology are in for a rude awakening. Consider the following (true) story: a project manager consistently conflated the concepts of the product’s installer, upgrader, and first-run bundler. The developers all thought it quaint, and stopped bothering to correct him after the fifth or sixth time… until he sold a huge customer on a feature that was only implemented in one, precisely because he went on using incorrect terminology. Engineering spent the next two months of weekends “correcting” the miscommunication by having to implement the feature, all because the organization didn’t value precise communication.

Defining specific system states
In another (true) story, a team noticed they were having the same two-hour meeting every two weeks, arguing over the definition of a “release candidate.” The team solved this by defining, outside of the context of a specific release, exactly what quality level, bug count, and build artifacts would be required for each milestone. This exercise reduced those meetings to thirty minutes; but it also had a couple positive unintended consequences: the terms became an easy (and surprisingly accurate) proxy to communicate risk about the entire release. And more interestingly: when someone started quibbling about the definitions the team had agreed upon, ala “I know it’s the final RC and we’re not code complete, but just ignore it,” it was easy to notice other pressures were in play and bring them up to be addressed directly. 
Milestones are one of but many “stateful terms” an organization might define that end up providing value, especially in the heat of “active communication.”

The Value Is Not In What What You Say, But What People Hear

Really examining the dynamics of our communication can be a bit like fish analyzing the water in which they swim: for something we do constantly every day, it can seem a bit “meta” and odd to spend any time thinking about how individuals and teams communicate, what human factors can be leveraged to improve the quality of that communication (both day-to-day and in crisis-modes), and why it brings any value.

But valuing accurate, efficient communication as a part of an operational culture can have a huge impact. There’s a reason we structure written language using a standard: it reduces the cognitive load required to communicate ideas, which is what source code, operations run-books, and those pesky design documents really are all about.

Accepting this cognitive reality and incorporating the cultural value that a common language and dialect or style in an operational environment, even if it’s not your preferred one, helps keep the system consistent and functioning. It also increases communication bandwidth, which improves efficiency, especially in emergency/outage scenarios, where seconds count and miscommunication can have a huge impact.

Combining this mindfulness about how us humans communicate with each other and annotating and connecting the data our systems generate constantly can bring tangible, daily benefits to the commit-test-deploy cycle for teams across the entire organization. After your organization begins mindful standardization, with some tools to increase the bandwidth of communication between engineers and teams, operational expectations can be set for all sorts of situations. We’ll take a look at why that’s useful and how you can do that for your own operational environment next week.

tags: , , ,

Get the O’Reilly Web Ops & Performance Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.