As sysadmins we have been responsible for running applications for decades. We have done everything to meet demanding SLAs including “automating all the things” and even trading sleep cycles to recuse applications from production fires. While we have earned many battle scars and can step back and admire fully automated deployment pipelines, it feels like there has always been something missing. Our infrastructure still feels like an accident waiting to happen and somehow, no matter how much we manage to automate, the expense of infrastructure continues to increase.
The root of this feeling comes from the fact that many of our tools don’t provide the proper insight into what’s really going on and require us to reverse engineer applications in order to effectively monitor them and recover from failures. Today many people bolt on monitoring solutions that attempt to probe applications from the outside and report “health” status to a centralized monitoring system, which seems to be riddled with false alarms or a list of alarms that are not worth looking into because there is no clear path to resolution.
What makes this worse is how we typically handle common failure scenarios such as node failures. Today many of us are forced to statically assign applications to machines and manage resource allocations on a spreadsheet. It’s very common to assign a single application to a VM to avoid dependency conflicts and ensure proper resource allocations. Many of the tools in our tool belt have be optimized for this pattern and the results are less than optimal. Sure this is better than doing it manually, but current methods are resulting in low resource utilization, which means our EC2 bills continue to increase — because the more you automate, the more things people want to do.
How do we reverse course on this situation? How do we gain more insight and increase resource utilization without rewriting all of our applications? That’s where application schedulers and management frameworks like Marathon and Kubernetes comes in. If you have not heard of these tools, you owe it to yourself to check them out. I focus on Kubernetes since that’s the cluster manager I use.
Taken from the website:
Kubernetes is an open source orchestration system for Docker containers. It handles scheduling onto nodes in a compute cluster and actively manages workloads to ensure that their state matches the user’s declared intentions. Using the concepts of “labels” and “pods”, it groups the containers, which make up an application, into logical units for easy management and discovery.
Kubernetes is designed to automate application deployments and ensure they are up and running by “redeploying” applications from failed nodes. Kubernetes also helps lower those EC2 bills by utilizing a resource scheduler to pack machines in a way that utilizes their resources efficiently across a fleet of machines. Say goodbye to statically assigning applications to machines and scripting common failover scenarios for the 100th time. Let Kubernetes handle that. I should also mention that Kubernetes provides robust logging and a set of monitoring features that are sure to uncover more details on how your applications perform in production. The best part: most of this stuff works out of the box, and since it’s fully integrated, you’ll spend less time setting things up and more time focused on valuable things, like going home on time.
If you want to learn more about Kubernetes, then be sure to check out the tutorial I’ll be leading at OSCON Amsterdam – Taming Microservices with CoreOS and Kubernetes.
You can also attend:
- Who are you and what did you do with my containers?
- Deploy your applications to the cloud using containers
Public domain glass honeycomb image via Pixabay.