As applications move from onpremise to SaaS, the scale of deployments increases by orders of magnitude (to “webscale”). At the same time, application development and operation become tightly integrated and continuous deployment brings the frequency of updates down from months to days or even hours.
The larger scale makes the health of SaaS applications mission-critical and even existential to its providers, while the frequent updates increase the risk of failures. Therefore, monitoring and root cause analysis also become mission critical functions, and more instrumentation is needed to ensure the application’s quality of service. At the company I co-founded, we see customers using extensive and often tailored instrumentation that generates massive amounts of data (think hundreds of thousands of data streams and billions of data points per day).
Monitoring solutions that were sufficient for onpremise applications do not meet the needs of webscale applications in terms of robustness, scalability, elasticity, and flexibility for dealing with custom metrics. This is why organizations deploying such applications often host their own solutions that use open source components for the two tasks a monitoring system needs to perform:
This is relatively straightforward, since there are many open source tools that collect data generated by the resources being monitored. Applications can be instrumented by specialized APM tools or—increasingly—by the language framework and by developers interested in collecting applicationspecific “custom” metrics right in the application.
Aggregation, analysis, alerting and storage
This set of tasks presents more of a challenge, because it needs its own processing and storage infrastructure in order to scale with the application and remain available even if the monitored application or its underlying infrastructure is in trouble. The collection is also a generic function, so it would be nice if all monitoring data could be consolidated in a single place.
At my day job we see a lot of these kinds of exchanges and talk with many smart engineers, doing great work, with good tools, who are nevertheless challenged by the scale and pace of developing webscale multitenant applications. They collect lots of metrics, for which they need aggregation, analysis, alerting and storage, but don’t always have the means or opportunity to scale those capabilities in step with their primary infrastructure.
Many of our customers come to us when their solution has reached the limits of its scalability and needs significant investment to be brought to the next level. We also count many “lean” startups amongst our customers, who have decided from the getgo to look for thirrd-party solutions, so they can focus on their core business and on getting the best instrumentation in place. This is why we believe that the future of monitoring data is in the cloud:
Companies will use cloud-based services for aggregation/analysis/alerting/storage of monitoring data. These services need to be robust, with redundant storage of monitoring data and the ability to survive failures in their underlying infrastructure. They also need to be highly scalable and provide a rich complement of features for data aggregation, anomaly detection, alerting, and root cause analysis, and readily integrate with other cloudbased tools in the tool chain. Finally, all of their features need to be accessible through RESTful APIs, so that they can be part of the ecosystem of tools that development and operations teams have at their disposal.
Companies will use extensive instrumentation for both infrastructure (using open source monitoring tools) and applications (with the aid of APM tools, language frameworks and custom metrics). In the same way as data collection tools integrate with open source tools for aggregation/analysis/alerting/storage of monitoring data, the instrumentation will integrate with SaaS platforms that support these functions.
As SaaS platforms for aggregation/analysis/alerting/storage of monitoring data become more powerful, more companies will favor these over selfhosted solutions, for reasons similar to why SaaS solutions are becoming preferred over selfhosted solutions in many other areas. This then will further accelerate the evolution of these services.
Fred van den Bosch is CEO and co-founder of Librato. In previous lives he worked in operating systems development at the Computer Systems division of Philips Electronics, and storage software development, as EVP Engineering and CTO at VERITAS Software.