Explore the declarative, idempotent, and stateless Puppet DSL.
Before we begin to explore practical best practices with Puppet, it’s valuable to understand the reasoning behind these recommendations.
Puppet can be somewhat alien to technologists who have a background in automation scripting. Where most of our scripts are procedural, Puppet is declarative. While a declarative language has many major advantages for configuration management, it does impose some interesting restrictions on the approaches we use to solve common problems.
Although Puppet’s design philosophy may not be the most exciting topic to begin this book, it drives many of the practices in the coming chapters. Understanding that philosophy will help contextualize many of the recommendations covered.
Empathy, communication, and collaboration across organizational boundaries.
I might try to define DevOps as the movement that doesn’t want to be defined. Or as the movement that wants to evade the inevitable cargo-culting that goes with most technical movements. Or the non-movement that’s resisting becoming a movement. I’ve written enough about “what is DevOps” that I should probably be given an honorary doctorate in DevOps Studies.
Baron Schwartz (among others) thinks it’s high time to have a definition, and that only a definition will save DevOps from an identity crisis. Without a definition, it’s subject to the whims of individual interest groups, and ultimately might become a movement that’s defined by nothing more than the desire to “not be like them.” Dave Zwieback (among others) says that the lack of a definition is more of a blessing than a curse, because it “continues to be an open conversation about making our organizations better.” Both have good points. Is it possible to frame DevOps in a way that preserves the openness of the conversation, while giving it some definition? I think so.
DevOps started as an attempt to think long and hard about the realities of running a modern web site, a problem that has only gotten more difficult over the years. How do we build and maintain critical sites that are increasingly complex, have stringent requirements for performance and uptime, and support thousands or millions of users? How do we avoid the “throw it over the wall” mentality, in which an operations team gets the fallout of the development teams’ bugs? How do we involve developers in maintenance without compromising their ability to release new software?
The NSA Can't Replace 90% of Its System Administrators
In the aftermath of Edward Snowden’s revelations about NSA’s domestic surveillance activities, the NSA has recently announced that they plan to get rid of 90% of their system administrators via software automation in order to “improve security.” So far, I’ve mostly seen this piece of news reported and commented on straightforwardly. But it simply doesn’t add up. Either the NSA has a monumental (yet not necessarily surprising) level of bureaucratic bloat that they could feasibly cut that amount of staff regardless of automation, or they are simply going to be less effective once they’ve reduced their staff. I talked with a few people who are intimately familiar with the kind of software that would typically be used for automation of traditional sysadmin tasks (Puppet and Chef). Typically, their products are used to allow an existing group of operations people to do much more, not attempting to do the same amount of work with significantly fewer people. The magical thinking that the NSA can actually put in automation sufficient to do away with 90% of their system administration staff belies some fundamental misunderstandings about automation. I’ll tackle the two biggest ones here.
1. Automation replaces people. Automation is about gaining leverage–it’s about streamlining human tasks that can be handled by computers in order to add mental brainpower. As James Turnbull, former VP of Business Development for PuppetLabs, said to me, “You still need smart people to think about and solve hard problems.” (Whether you agree with the types of problems the NSA is trying to solve is a completely different thing, of course.) In reality, the NSA should have been working on automation regardless of the Snowden affair. It has a massive, complex infrastructure. Deploying a new data center, for example, is a huge undertaking; it’s not something you can automate.
Or as Seth Vargo, who works for OpsCode–the creators of configuration management automation software Chef–puts it, “There’s still decisions to be made. And the machines are going to fail.” Sascha Bates (also with OpsCode) chimed in to point out that “This presumes that system administrators only manage servers.” It’s a naive view. Are the DBAs going away, too? Network administrators? As I mentioned earlier, the NSA has a massive, complicated infrastructure that will always require people to manage it. That plus all the stuff that isn’t (theoretically) being automated will now fall on the remaining 10% who don’t get laid off. And that remaining 10% will still have access to the same information.
2. Automation increases security. Automation increases consistency, which can have a relationship with security. Prior to automating something, you might have a wide variety of people doing the same thing in varying ways, hence with varying outcomes. From a security standpoint, automation provides infrastructure security, and makes it auditable. But it doesn’t really increase data/information security (e.g. this file can/cannot live on that server)–those too are human tasks requiring human judgement. And that’s just the kind of information Snowden got his hands on. This is another example of a government agency over-reacting to a low probability event after the fact. Getting rid of 90% of their sysadmins is the IT equivalent of still requiring airline passengers to take off their shoes and cram their tiny shampoo bottles into plastic baggies; it’s security theater.
There are a few upsides, depending on your perspective on this whole situation. First, if your company is in the market for system administrators, you might want to train your recruiters on D.C. in the near future. Additionally, odds are the NSA is going to be less effective than it is right now. Perhaps, like the CIA, they are also courting Amazon Web Services (AWS) to help run their own private cloud, but again, as Sascha said, managing servers is only a small piece of the system administrator picture.
If you care about or are interested in automation, operations, and security, please join us at Velocity New York on October 14-16. Dr. Nancy Leveson will be delivering a fantastic keynote on security and complex systems.
OSCON 2013 Speaker Series
Automating the configuration management of your operating systems and the rollout of your applications is one of the most important things an administrator or developer can do to avoid surprises when updating services, scaling up, or recovering from failures. However, it’s often not enough. Some of the most common operations that happen in your datacenter (or cloud environment) involve large numbers of machines working together and humans to mediate those processes. While we have been able to remove a lot of human effort from configuration, there has been a lack of software able to handle these higher-level operations.
I used to work for a hosted web application company where the IT process for executing an application update involved locking six people in a room for sometimes 3-4 hours, each person pressing the right buttons at the right time. This process almost always had a glitch somewhere where someone forgot to run the right command or something wasn’t well tested beforehand. While some technical solutions were applied to handle configuration automation, nothing that could perform configuration could really accomplish that high level choreography on top as well. This is why I wrote Ansible.
Ansible is a configuration management, application deployment, and IT orchestration system. One of Ansible’s strong points is having a very simple, human readable language – it allows users very fine, precise control over what happens on what machines at what times.
To get started, create an inventory file, for instance, ~/ansible_hosts that defines what machines you are managing, and which machines are frequently organized into groups. Ansible can also pull inventory from multiple cloud sources, but an inventory file is a quick way to get started:
[webservers] www01.example.com www02.example.com # add more webservers here [monitoring] nagios1.example.com [lbservers] haproxy1.example.com haproxy2.example.com
Now that you have defined what machines you are managing, you have to define what you are going to do on the remote machines.
Ansible calls this description of processes a “playbook,” and you don’t have to have just one, you could have different playbooks for different kinds of tasks.
Let’s look at an example for describing a rolling update process. This example is somewhat involved because it’s using haproxy, but haproxy is freely available. Ansible also includes modules for dealing with Netscalers and F5 load balancers, so this is just an example — ordinarily you would start more simply and work up to an example like this:
Velocity 2013 speaker series
“Puppet and Chef are completely different, and yet exactly the same,” admits Sascha Bates (@sascha_d). In this interview about her talk at the upcoming Velocity Conference, she discusses common pitfalls that people can avoid when getting started with configuration management. And here’s a hint: it isn’t about which tool you choose.
After years in the trenches helping a variety of organizations implement Chef, Sascha learned (often the hard way) a few critical things that she’ll share in her talk. Key points from our discussion include:
- When getting started with configuration management, people often fret over which tool they use, when they should be thinking more about the overall integration with their particular system. [Discussed at the 0:50 mark.]
- Both Chef and Puppet have the concept of a package manager, and if you’re not setting that up properly, things can spiral out of control quickly. [Discussed at the 1:25 mark.]
- Her top configuration management anti-patterns. [Discussed at the 2:43 mark.]
- What superpower someone will have after attending her talk. [Discussed at the 3:55 mark.]
Watch the whole interview here:
This is the first in a series of posts related to the upcoming Velocity Conference in Santa Clara, CA (June 18-20). We’ll be highlighting speakers in a variety of ways, from video and email interviews to posts by the speakers themselves.