Exploring lightweight monitoring systems

Toward unifying customer behavior and operations metrics.

lightweight_systemsFor the last ten years I’ve had a foot in both the development and operations worlds. I stumbled into the world of IT operations as a result of having the most UNIX skills in the team shortly after starting at ThoughtWorks. I was fortunate enough to do so at a time when many of my ThoughtWorks colleagues and I where working on the ideas which were captured so well in Jez Humble and Dave Farley’s Continuous Delivery (Addison-Wesley).

During this time, our focus was on getting our application into production as quickly as possible. We were butting up against the limits of infrastructure automation and IaaS providers like Amazon were only in their earliest form.

Recently, I have spent time with operations teams who are most concerned with the longer-term challenges of looking after increasingly complex ecosystems of systems. Here the focus is on immediate feedback and knowing if they need to take action. At a certain scale, complex IT ecosystems can seem to exhibit emergent behavior, like an organism. The operations world has evolved a series of tools which allow these teams to see what’s happening *right now* so we can react, keep things running, and keep people happy.

At the same time, those of us who spend time thinking about how to quickly and effectively release our applications have become preoccupied with wanting to know if that software does what our customers want once it gets released. The Lean Startup movement has shown us the importance of putting our software in front of our customers, then working out how they actually use it so we can determine what to do next. In this world, I was struck by the shortcomings of the tools in this space. Commonly used web analytics tools, for example, might only help me understand tomorrow how my customers used my site today.

Many people have realized that the tools available for monitoring in the IT operations space were significantly better at showing us, in real-time, how our applications were performing, while the traditional analytics tools often let us down. In Lightweight Systems for Realtime Monitoring, I take a look at a number of easy-to-use, focused monitoring tools from this world. These tools, when applied to the context of understanding our customers behavior, can put all the information we need at our fingertips today rather than tomorrow.

In the short ebook I also explore where these systems are going. Away from a world of siloed metrics in isolated systems, towards systems like Suro or Riemann, which can handle generic streams of events. By unifying the customer behavioral metrics and IT Operations metrics, we can potentially simplify our system architectures and help improve our understanding of how our systems behave and help work out what to build next.

tags: , , , , , , , ,