Comparing different orchestration tools.
Most software systems evolve over time. New features are added and old ones pruned. Fluctuating user demand means an efficient system must be able to quickly scale resources up and down. Demands for near zero-downtime require automatic fail-over to pre-provisioned back-up systems, normally in a separate data centre or region.
On top of this, organizations often have multiple such systems to run, or need to run occasional tasks such as data-mining that are separate from the main system, but require significant resources or talk to the existing system.
When using multiple resources, it is important to make sure they are efficiently used — not sitting idle — but can still cope with spikes in demand. Balancing cost-effectiveness against the ability to quickly scale is difficult task that can be approached in a variety of ways.
All of this means that the running of a non-trivial system is full of administrative tasks and challenges, the complexity of which should not be underestimated. It quickly becomes impossible to look after machines on an individual level; rather than patching and updating machines one-by-one they must be treated identically. When a machine develops a problem it should be destroyed and replaced, rather than nursed back to health.
Various software tools and solutions exist to help with these challenges. Let’s focus on orchestration tools, which help make all the pieces work together, working with the cluster to start containers on appropriate hosts and connect them together. Along the way, we’ll consider scaling and automatic failover, which are important features.
How DevOps will help you surpass the common challenges of financial services software development.
Download a free copy of DevOps for Finance, an O’Reilly report by Jim Bird for the financial services software insider who’s heard about DevOps, but is unsure whether it represents solution or suicide.
DevOps, until recently, has been a story about unicorns. Innovative, engineering-driven online tech companies like Flickr, Etsy, Twitter, Facebook, and Google. Netflix and its Chaos Monkey. Amazon deploying thousands of changes per day.
DevOps was originally about WebOps at Internet companies working in the Cloud, and a handful of Lean Startups in Silicon Valley. It started at these companies because they had to move quickly, so they found new, simple, and collaborative ways of working that allowed them to innovate much faster and to scale much more effectively than organizations had done before.
But as the velocity of change in business continues to increase, enterprises — sometimes referred to as “horses,” in contrast to the unicorns referenced above — must also move to deliver content and features to customers just as quickly. These large organizations have started to adopt (and, along the way, adapt) DevOps ideas, practices, and tools.
This short book assumes that you have heard about DevOps and want to understand how DevOps practices like Continuous Delivery and Infrastructure as Code can be used to solve problems in financial systems at a trading firm, or a big bank or stock exchange. We’ll look at the following key ideas in DevOps, and how they fit into the world of financial systems:
- Breaking down the “wall of confusion” between development and operations, and extending Agile practices and values from development to operations
- Using automated configuration management tools like Chef, Puppet, CFEngine, or Ansible to programmatically provision and configure systems (Infrastructure as Code)
- Building Continuous Integration and Continuous Delivery (CI/CD) pipelines to automatically test and push out changes, and wiring security and compliance into these pipelines
- Using containerization and virtualization technologies like Docker and Vagrant, together with Infrastructure as Code, to create IaaS, PaaS, and SaaS clouds
- Running experiments, creating fast feedback loops, and learning from failure
To follow this book you need to understand a little about these ideas and practices. There is a lot of good stuff about DevOps out there, amid the hype. Read more…
Velocity 2013 Speaker Series
In 2002, US Secretary of State Donald Rumsfeld told a reporter that not only don’t we know everything important, but sometimes we don’t even know what knowledge we lack:
There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know.
One of the purposes of monitoring is to build early-warning systems to alert of problems before they become serious. But how can we recognize a failure in its early stages? It’s a thorny question.
OSCON 2013 Speaker Series
Automating the configuration management of your operating systems and the rollout of your applications is one of the most important things an administrator or developer can do to avoid surprises when updating services, scaling up, or recovering from failures. However, it’s often not enough. Some of the most common operations that happen in your datacenter (or cloud environment) involve large numbers of machines working together and humans to mediate those processes. While we have been able to remove a lot of human effort from configuration, there has been a lack of software able to handle these higher-level operations.
I used to work for a hosted web application company where the IT process for executing an application update involved locking six people in a room for sometimes 3-4 hours, each person pressing the right buttons at the right time. This process almost always had a glitch somewhere where someone forgot to run the right command or something wasn’t well tested beforehand. While some technical solutions were applied to handle configuration automation, nothing that could perform configuration could really accomplish that high level choreography on top as well. This is why I wrote Ansible.
Ansible is a configuration management, application deployment, and IT orchestration system. One of Ansible’s strong points is having a very simple, human readable language – it allows users very fine, precise control over what happens on what machines at what times.
To get started, create an inventory file, for instance, ~/ansible_hosts that defines what machines you are managing, and which machines are frequently organized into groups. Ansible can also pull inventory from multiple cloud sources, but an inventory file is a quick way to get started:
[webservers] www01.example.com www02.example.com # add more webservers here [monitoring] nagios1.example.com [lbservers] haproxy1.example.com haproxy2.example.com
Now that you have defined what machines you are managing, you have to define what you are going to do on the remote machines.
Ansible calls this description of processes a “playbook,” and you don’t have to have just one, you could have different playbooks for different kinds of tasks.
Let’s look at an example for describing a rolling update process. This example is somewhat involved because it’s using haproxy, but haproxy is freely available. Ansible also includes modules for dealing with Netscalers and F5 load balancers, so this is just an example — ordinarily you would start more simply and work up to an example like this:
OSCON 2013 Speaker Series
Caching is the method that most improves response time in web applications (as Steve Souders shows in Cache is King), but in order to make use of it, every layer of your application must be configured for that purpose.
Most applications are initially developed with little or no use of caching and then must be refactored to fulfill performance goals. However, this approach incurs extra development costs that could be saved if response time is taken into consideration in the early stages of the development process.
The methodology that can save your life while you are still developing your application is pretty straightforward: keep caching in mind whenever handling data in your system. Either web APIs or internal backend data flows need to ask one simple question:
Can I survive if the data seen by the user is not the latest?
Sometimes the answer to this question is ‘no.’ For example, I would be fired very quickly if I built a bank system that showed more money than one consumer’s account really has. On the other hand, if the system interacts with general data services like social networks, news, weather, car traffic, etc., there is less need to ensure the latest piece of information is immediately shown to the user.
Of course, the latest data needs to eventually get to the user. Data cannot be too old or you risk confusing the user, but configuring a short expiration time (let’s say 5-10 minutes or less) for dynamic data that can support it can significantly improve the response time experience. That is called temporal consistency and it is crucial for having a successful caching strategy in place.
Nowadays, web applications are based on mashing up several web services coming from different sources. The best way to tackle different response times as well as data designs is to temporally cache those elements across all system layers. It is also applicable to data coming from your own system if the information needs to travel from one part of the world to another in several hops. If information is not critical, consider caching it at any intermediate stage and reuse when it is needed. Caching in the backend can avoid half of a trip. Even better would be to cache at the target device or a CDN system that can dispose of the full data trip or reduce it to only the last mile as an easy way to enhance performance.
A day in the life of DevOps, and the skills you'll need to enter the field.
In this Velocity podcast, OmniTI CEO Theo Schlossnagle discusses the skills of DevOps professionals and knowing how you've achieved excellence in the field.
Why "even faster" matters in the web performance and optimization world.
Steve Souders on the state of web performance, optimization and velocity.
A tribe of web performance and operations pros is pushing the web forward.
As we approach the fourth Velocity conference, here's a look at how the web performance and operations communities came together, what they've done to improve the web experience, and the work that lies ahead.