"automation" entries

Automation Myths

The NSA Can't Replace 90% of Its System Administrators

In the aftermath of Edward Snowden’s revelations about NSA’s domestic surveillance activities, the NSA has recently announced that they plan to get rid of 90% of their system administrators via software automation in order to “improve security.” So far, I’ve mostly seen this piece of news reported and commented on straightforwardly. But it simply doesn’t add up. Either the NSA has a monumental (yet not necessarily surprising) level of bureaucratic bloat that they could feasibly cut that amount of staff regardless of automation, or they are simply going to be less effective once they’ve reduced their staff. I talked with a few people who are intimately familiar with the kind of software that would typically be used for automation of traditional sysadmin tasks (Puppet and Chef). Typically, their products are used to allow an existing group of operations people to do much more, not attempting to do the same amount of work with significantly fewer people. The magical thinking that the NSA can actually put in automation sufficient to do away with 90% of their system administration staff belies some fundamental misunderstandings about automation. I’ll tackle the two biggest ones here.

1. Automation replaces people. Automation is about gaining leverage–it’s about streamlining human tasks that can be handled by computers in order  to add mental brainpower. As James Turnbull, former VP of Business Development for PuppetLabs, said to me, “You still need smart people to think about and solve hard problems.” (Whether you agree with the types of problems the NSA is trying to solve is a completely different thing, of course.) In reality, the NSA should have been working on automation regardless of the Snowden affair. It has a massive, complex infrastructure. Deploying a new data center, for example, is a huge undertaking; it’s not something you can automate.

Or as Seth Vargo, who works for OpsCode–the creators of configuration management automation software Chef–puts it, “There’s still decisions to be made. And the machines are going to fail.” Sascha Bates (also with OpsCode) chimed in to point out that “This presumes that system administrators only manage servers.” It’s a naive view. Are the DBAs going away, too? Network administrators? As I mentioned earlier, the NSA has a massive, complicated infrastructure that will always require people to manage it. That plus all the stuff that isn’t (theoretically) being automated will now fall on the remaining 10% who don’t get laid off. And that remaining 10% will still have access to the same information.

2. Automation increases security. Automation increases consistency, which can have a relationship with security. Prior to automating something, you might have a wide variety of people doing the same thing in varying ways, hence with varying outcomes. From a security standpoint, automation provides infrastructure security, and makes it auditable. But it doesn’t really increase data/information security (e.g. this file can/cannot live on that server)–those too are human tasks requiring human judgement. And that’s just the kind of information Snowden got his hands on. This is another example of a government agency over-reacting to a low probability event after the fact. Getting rid of 90% of their sysadmins is the IT equivalent of still requiring airline passengers to take off their shoes and cram their tiny shampoo bottles into plastic baggies; it’s security theater.

There are a few upsides, depending on your perspective on this whole situation. First, if your company is in the market for system administrators, you might want to train your recruiters on D.C. in the near future. Additionally, odds are the NSA is going to be less effective than it is right now. Perhaps, like the CIA, they are also courting Amazon Web Services (AWS) to help run their own private cloud, but again, as Sascha said, managing servers is only a small piece of the system administrator picture.

If you care about or are interested in automation, operations, and security, please join us at Velocity New York on October 14-16. Dr. Nancy Leveson will be delivering a fantastic keynote on security and complex systems.

The Rise of Infrastructure as Data

Simplifying IT automation

IT infrastructure should be simpler to automate. A new method of describing IT configurations and policy as data formats can help us get there. To understand this conclusion, it helps to understand how the existing tool chains of automation software came to be.

In the beginnings of IT infrastructure, administrators seeking to avoid redundant typing wrote scripts to help them manage their growing computer hordes. The development of these in­house automation systems were not without cost; each organization built its own redundant tools. As scripting gurus left an organization, these scripts were often very difficult to maintain by new employees.

As we all know by the huge number of books written on the topic, software development sometimes has a large amount of time investment required to do it right. Systems management software is especially complex, due to all the possible variables and corner cases to be managed. These in­house scripting systems often grew to be fragile.

Read more…

Zero Downtime Application Updates with Ansible

OSCON 2013 Speaker Series

Automating the configuration management of your operating systems and the rollout of your applications is one of the most important things an administrator or developer can do to avoid surprises when updating services, scaling up, or recovering from failures. However, it’s often not enough. Some of the most common operations that happen in your datacenter (or cloud environment) involve large numbers of machines working together and humans to mediate those processes. While we have been able to remove a lot of human effort from configuration, there has been a lack of software able to handle these higher-level operations.

I used to work for a hosted web application company where the IT process for executing an application update involved locking six people in a room for sometimes 3-4 hours, each person pressing the right buttons at the right time. This process almost always had a glitch somewhere where someone forgot to run the right command or something wasn’t well tested beforehand. While some technical solutions were applied to handle configuration automation, nothing that could perform configuration could really accomplish that high level choreography on top as well. This is why I wrote Ansible.

Ansible is a configuration management, application deployment, and IT orchestration system. One of Ansible’s strong points is having a very simple, human readable language – it allows users very fine, precise control over what happens on what machines at what times.

Getting started

To get started, create an inventory file, for instance, ~/ansible_hosts that defines what machines you are managing, and which machines are frequently organized into groups. Ansible can also pull inventory from multiple cloud sources, but an inventory file is a quick way to get started:

[webservers]
www01.example.com
www02.example.com
# add more webservers here

[monitoring]
nagios1.example.com

[lbservers]
haproxy1.example.com
haproxy2.example.com

Now that you have defined what machines you are managing, you have to define what you are going to do on the remote machines.

Ansible calls this description of processes a “playbook,” and you don’t have to have just one, you could have different playbooks for different kinds of tasks.

Let’s look at an example for describing a rolling update process. This example is somewhat involved because it’s using haproxy, but haproxy is freely available. Ansible also includes modules for dealing with Netscalers and F5 load balancers, so this is just an example — ordinarily you would start more simply and work up to an example like this:
Read more…

Test-driven Infrastructure with Chef

Velocity 2013 Speaker Series

If you’re a System Administrator, you’re likely all too familiar with the 2:35am PagerDuty alert. “When you roll out testing on your infrastructure,” says Seth Vargo, “the number of alerts drastically decreases because you can build tests right into your Chef cookbooks.” We sat down to discuss his upcoming talk at Velocity, which promises to deliver many more restful nights for SysAdmins.

Key highlights from our discussion include:

  • There are not currently any standards regarding testing with Chef.  [Discussed at 1:09]
  • A recommended workflow that starts with unit testing  [Discussed at 2:11]
  • Moving cookbooks through a “pipeline” of testing with Test Kitchen [Discussed at 3:11]
  • In the event that something bad does make it into production, you can roll back actual infrastructure changes. [Discussed at 4:54]
  • Automating testing and cookbook uploads with Jenkins [Discussed at 5:40]

You can watch the full interview here:

 

Four short links: 6 July 2012

Four short links: 6 July 2012

UK Copyright Modernisation, Lessons from Cisco's Evil, Automation, and Kinect Tool

  1. HM Government Consultation on Modernising Copyright (PDF) — from all appearances, the UK Govt is prepared to be progressive and tech-savvy in considering updates to copyright law. Proof of the pudding is in the eating (i.e., wait and see whether the process is coopted by maximalists) but an optimistic start.
  2. Cisco Provides a Lesson (Eric Raymond) — This is why anyone who makes excuses for closed source in network-facing software is not just a fool deluded by shiny marketing but a malignant idiot whose complicity with what those vendors do will injure his neighbors as well as himself. […] If you don’t own it, it will surely own you.
  3. Automate or Perish (Technology Review) — As the MIT economist David Autor has argued, the job market is being “hollowed out.” […] Any work that is repetitive or fairly well structured is open to full or partial automation. Being human confers less and less of an advantage these days.
  4. Kinectable Pipe (Github) — command-line tool that writes skeleton data (as reported by Kinect) to stdout as text. Because Kinect programming is a pain in the neck, and by trivializing the device’s output into a simple text format, it becomes infinitely easier to digest in the scripting language of your choice.

Top Stories: October 17-21, 2011

The joys of animated geo data, Angry Birds and the future of mobile testing, and a look inside The Guardian's creative process.

This week on O'Reilly: Andy Kirk explained why data, maps and animation work so well together, we discovered the connection between a game-playing robot and the future of mobile app testing, and we learned how The Guardian develops its data journalism.

Operations is a competitive advantage… (Secret Sauce for Startups!)

My lunchtime conversations at the Summit centered around Operations as a competitive advantage (and occasionally a "strategic weapon"). This advantage is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally. Many people think of Operations as "a bunch of boring work… which I'm hoping someone else is doing." It often takes less…