Velocity 2013 Speaker Series
In 2002, US Secretary of State Donald Rumsfeld told a reporter that not only don’t we know everything important, but sometimes we don’t even know what knowledge we lack:
There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know.
One of the purposes of monitoring is to build early-warning systems to alert of problems before they become serious. But how can we recognize a failure in its early stages? It’s a thorny question.
OSCON 2013 Speaker Series
Automating the configuration management of your operating systems and the rollout of your applications is one of the most important things an administrator or developer can do to avoid surprises when updating services, scaling up, or recovering from failures. However, it’s often not enough. Some of the most common operations that happen in your datacenter (or cloud environment) involve large numbers of machines working together and humans to mediate those processes. While we have been able to remove a lot of human effort from configuration, there has been a lack of software able to handle these higher-level operations.
I used to work for a hosted web application company where the IT process for executing an application update involved locking six people in a room for sometimes 3-4 hours, each person pressing the right buttons at the right time. This process almost always had a glitch somewhere where someone forgot to run the right command or something wasn’t well tested beforehand. While some technical solutions were applied to handle configuration automation, nothing that could perform configuration could really accomplish that high level choreography on top as well. This is why I wrote Ansible.
Ansible is a configuration management, application deployment, and IT orchestration system. One of Ansible’s strong points is having a very simple, human readable language – it allows users very fine, precise control over what happens on what machines at what times.
To get started, create an inventory file, for instance, ~/ansible_hosts that defines what machines you are managing, and which machines are frequently organized into groups. Ansible can also pull inventory from multiple cloud sources, but an inventory file is a quick way to get started:
# add more webservers here
Now that you have defined what machines you are managing, you have to define what you are going to do on the remote machines.
Ansible calls this description of processes a “playbook,” and you don’t have to have just one, you could have different playbooks for different kinds of tasks.
Let’s look at an example for describing a rolling update process. This example is somewhat involved because it’s using haproxy, but haproxy is freely available. Ansible also includes modules for dealing with Netscalers and F5 load balancers, so this is just an example — ordinarily you would start more simply and work up to an example like this:
OSCON 2013 Speaker Series
Caching is the method that most improves response time in web applications (as Steve Souders shows in Cache is King), but in order to make use of it, every layer of your application must be configured for that purpose.
Most applications are initially developed with little or no use of caching and then must be refactored to fulfill performance goals. However, this approach incurs extra development costs that could be saved if response time is taken into consideration in the early stages of the development process.
The methodology that can save your life while you are still developing your application is pretty straightforward: keep caching in mind whenever handling data in your system. Either web APIs or internal backend data flows need to ask one simple question:
Can I survive if the data seen by the user is not the latest?
Sometimes the answer to this question is ‘no.’ For example, I would be fired very quickly if I built a bank system that showed more money than one consumer’s account really has. On the other hand, if the system interacts with general data services like social networks, news, weather, car traffic, etc., there is less need to ensure the latest piece of information is immediately shown to the user.
Of course, the latest data needs to eventually get to the user. Data cannot be too old or you risk confusing the user, but configuring a short expiration time (let’s say 5-10 minutes or less) for dynamic data that can support it can significantly improve the response time experience. That is called temporal consistency and it is crucial for having a successful caching strategy in place.
Nowadays, web applications are based on mashing up several web services coming from different sources. The best way to tackle different response times as well as data designs is to temporally cache those elements across all system layers. It is also applicable to data coming from your own system if the information needs to travel from one part of the world to another in several hops. If information is not critical, consider caching it at any intermediate stage and reuse when it is needed. Caching in the backend can avoid half of a trip. Even better would be to cache at the target device or a CDN system that can dispose of the full data trip or reduce it to only the last mile as an easy way to enhance performance.
A day in the life of DevOps, and the skills you'll need to enter the field.
In this Velocity podcast, OmniTI CEO Theo Schlossnagle discusses the skills of DevOps professionals and knowing how you've achieved excellence in the field.
Why "even faster" matters in the web performance and optimization world.
Steve Souders on the state of web performance, optimization and velocity.
A tribe of web performance and operations pros is pushing the web forward.
As we approach the fourth Velocity conference, here's a look at how the web performance and operations communities came together, what they've done to improve the web experience, and the work that lies ahead.
The state of the Velocity Conference.
Over its three-year history, the Velocity Conference has expanded to include mobile performance, "Velocity Culture," and a new line of bath products (that last one might not be the best fit).
Open Compute could be a big step forward for infrastructure, ops, and the web.
Jesse Robbins says Facebook's Open Compute Project represents a giant step for open source hardware, for the evolution of the web and cloud computing, and for infrastructure and operations in general.