Why the data center needs an operating system

It’s time for applications — not servers — to rule the data center.


Developers today are building a new class of applications. These applications no longer fit on a single server, but instead run across a fleet of servers in a data center. Examples include analytics frameworks like Apache Hadoop and Apache Spark, message brokers like Apache Kafka, key-value stores like Apache Cassandra, as well as customer-facing applications such as those run by Twitter and Netflix.

These new applications are more than applications, they are distributed systems. Just as it became commonplace for developers to build multithreaded applications for single machines, it’s now becoming commonplace for developers to build distributed systems for data centers.

But it’s difficult for developers to build distributed systems, and it’s difficult for operators to run distributed systems. Why? Because we expose the wrong level of abstraction to both developers and operators: machines.

Machines are the wrong abstraction

Machines are the wrong level of abstraction for building and running distributed applications. Exposing machines as the abstraction to developers unnecessarily complicates the engineering, causing developers to build software constrained by machine-specific characteristics, like IP addresses and local storage. This makes moving and resizing applications difficult if not impossible, forcing maintenance in data centers to be a highly involved and painful procedure.

With machines as the abstraction, operators deploy applications in anticipation of machine loss, usually by taking the easiest and most conservative approach of deploying one application per machine. This almost always means machines go underutilized since we rarely buy our machines (virtual or physical) to exactly fit our applications, or size our applications to exactly fit our machines.

It’s time we created the POSIX for distributed computing: a portable API for distributed systems running in a data center or on a cloud.By running only one application per machine, we end up dividing our data center into highly static, highly inflexible partitions of machines, one for each distributed application. We end up with a partition that runs analytics, another that runs the databases, another that runs the web servers, another that runs the message queues, and so on. And the number of partitions is only bound to increase as companies replace monolithic architectures with service-oriented architectures and build more software based on microservices.

What happens when a machine dies in one of these static partitions? Let’s hope we over-provisioned sufficiently (wasting money), or can re-provision another machine quickly (wasting effort). What about when the web traffic dips to its daily low? With static partitions we allocate for peak capacity, which means when traffic is at its lowest, all of that excess capacity is wasted. This is why a typical data center runs at only 8-15% efficiency. And don’t be fooled just because you’re running in the cloud: you’re still being charged for the resources your application is not using on each virtual machine (someone is benefiting — it’s just your cloud provider, not you).

And finally, with machines as the abstraction, organizations must employ armies of people to manually configure and maintain each individual application on each individual machine. People become the bottleneck for trying to run new applications, even when there are ample resources already provisioned that are not being utilized.

If my laptop were a data center

Imagine if we ran applications on our laptops the same way we run applications in our data centers. Each time we launched a web browser or text editor, we’d have to specify which CPU to use, which memory modules are addressable, which caches are available, and so on. Thankfully, our laptops have an operating system that abstracts us away from the complexities of manual resource management.

In fact, we have operating systems for our workstations, servers, mainframes, supercomputers, and mobile devices, each optimized for their unique capabilities and form factors.

We’ve already started treating the data center itself as one massive warehouse-scale computer. Yet, we still don’t have an operating system that abstracts and manages the hardware resources in the data center just like an operating system does on our laptops.

It’s time for the data center OS

What would an operating system for the data center look like?

From an operator’s perspective it would span all of the machines in a data center (or cloud) and aggregate them into one giant pool of resources on which applications would be run. You would no longer configure specific machines for specific applications; all applications would be capable of running on any available resources from any machine, even if there are other applications already running on those machines.

From a developer’s perspective, the data center operating system would act as an intermediary between applications and machines, providing common primitives to facilitate and simplify building distributed applications.

The data center operating system would not need to replace Linux or any other host operating systems we use in our data centers today. The data center operating system would provide a software stack on top of the host operating system. Continuing to use the host operating system to provide standard execution environments is critical to immediately supporting existing applications.

The data center operating system would provide functionality for the data center that is analogous to what a host operating system provides on a single machine today: namely, resource management and process isolation. Just like with a host operating system, a data center operating system would enable multiple users to execute multiple applications (made up of multiple processes) concurrently, across a shared collection of resources, with explicit isolation between those applications.

An API for the data center

Perhaps the defining characteristic of a data center operating system is that it provides a software interface for building distributed applications. Analogous to the system call interface for a host operating system, the data center operating system API would enable distributed applications to allocate and deallocate resources, launch, monitor, and destroy processes, and more. The API would provide primitives that implement common functionality that all distributed systems need. Thus, developers would no longer need to independently re-implement fundamental distributed systems primitives (and inevitably, independently suffer from the same bugs and performance issues).

Centralizing common functionality within the API primitives would enable developers to build new distributed applications more easily, more safely, and more quickly. This is reminiscent of when virtual memory was added to host operating systems. In fact, one of the virtual memory pioneers wrote that “it was pretty obvious to the designers of operating systems in the early 1960s that automatic storage allocation could significantly simplify programming.”

Example primitives

Two primitives specific to a data center operating system that would immediately simplify building distributed applications are service discovery and coordination. Unlike on a single host where very few applications need to discover other applications running on the same host, discovery is the norm for distributed applications. Likewise, most distributed applications achieve high availability and fault tolerance through some means of coordination and/or consensus, which is notoriously hard to implement correctly and efficiently.

With a data center operating system, a software interface replaces the human interface.Developers today are forced to pick between existing tools for service discovery and coordination, such as Apache ZooKeeper and CoreOS’ etcd. This forces organizations to deploy multiple tools for different applications, significantly increasing operational complexity and maintainability.

Having the data center operating system provide primitives for discovery and coordination not only simplifies development, it also enables application portability. Organizations can change the underlying implementations without rewriting the applications, much like you can choose between different filesystem implementations on a host operating system today.

A new way to deploy applications

With a data center operating system, a software interface replaces the human interface that developers typically interact with when trying to deploy their applications today; rather than a developer asking a person to provision and configure machines to run their applications, developers launch their applications using the data center operating system (e.g., via a CLI or GUI), and the application executes using the data center operating system’s API.

This supports a clean separation of concerns between operators and users: operators specify the amount of resources allocatable to each user, and users launch whatever applications they want, using whatever resources are available to them. Because an operator now specifies how much of any type of resource is available, but not which specific resource, a data center operating system, and the distributed applications running on top, can be more intelligent about which resources to use in order to execute more efficiently and better handle failures. Because most distributed applications have complex scheduling requirements (think Apache Hadoop) and specific needs for failure recovery (think of a database), empowering software to make decisions instead of humans is critical for operating efficiently at data-center scale.

The “cloud” is not an operating system

Why do we need a new operating system? Didn’t Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) already solve these problems?

IaaS doesn’t solve our problems because it’s still focused on machines. It isn’t designed with a software interface intended for applications to use in order to execute. IaaS is designed for humans to consume, in order to provision virtual machines that other humans can use to deploy applications; IaaS turns machines into more (virtual) machines, but does not provide any primitives that make it easier for a developer to build distributed applications on top of those machines.

PaaS, on the other hand, abstracts away the machines, but is still designed first and foremost to be consumed by a human. Many PaaS solutions do include numerous tangential services and integrations that make building a distributed application easier, but not in a way that’s portable across other PaaS solutions.

Apache Mesos: The distributed systems kernel

Distributed computing is now the norm, not the exception, and we need a data center operating system that delivers a layer of abstraction and a portable API for distributed applications. Not having one is hindering our industry. Developers should be able to build distributed applications without having to reimplement common functionality. Distributed applications built in one organization should be capable of being run in another organization easily.

Existing cloud computing solutions and APIs are not sufficient. Moreover, the data center operating system API must be built, like Linux, in an open and collaborative manner. Proprietary APIs force lock-in, deterring a healthy and innovative ecosystem from growing. It’s time we created the POSIX for distributed computing: a portable API for distributed systems running in a data center or on a cloud.

The open source Apache Mesos project, of which I am one of the co-creators and the project chair, is a step in that direction. Apache Mesos aims to be a distributed systems kernel that provides a portable API upon which distributed applications can be built and run.

Many popular distributed systems have already been built directly on top of Mesos, including Apache Spark, Apache Aurora, Airbnb’s Chronos, and Mesosphere’s Marathon. Other popular distributed systems have been ported to run on top of Mesos, including Apache Hadoop, Apache Storm, and Google’s Kubernetes, to list a few.

Chronos is a compelling example of the value of building on top of Mesos. Chronos, a distributed system that provides highly available and fault-tolerant cron, was built on top of Mesos in only a few thousand lines of code and without having to do any explicit socket programming for network communication.

Companies like Twitter and Airbnb are already using Mesos to help run their datacenters, while companies like Google have been using in-house solutions they built almost a decade ago. In fact, just like Google’s MapReduce spurred an industry around Apache Hadoop, Google’s in-house datacenter solutions have had close ties with the evolution of Mesos.

While not a complete data center operating system, Mesos, along with some of the distributed applications running on top, provide some of the essential building blocks from which a full data center operating system can be built: the kernel (Mesos), a distributed init.d (Marathon/Aurora), cron (Chronos), and more.

Interested in learning more about or contributing to Mesos? Check out mesos.apache.org and follow @ApacheMesos on Twitter. We’re a growing community with users at companies like Twitter, Airbnb, Hubspot, OpenTable, eBay/Paypal, Netflix, Groupon, and more.

Cropped image on article and category pages by Karl-Ludwig Poggemann on Flickr, used under a Creative Commons license.

tags: , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • Joe


    • HotDang

      MVC and MVP.

  • Lachlan


    >>Many PaaS solutions do include numerous tangential services and
    >>integrations that make building a distributed application easier, but
    >>not in a way that’s portable across other PaaS solutions.

    It seems that your objection to a PaaS-like solution could be summarized as “a lack of standardization”. If the industry were to standardize on a particular PaaS, with all the appropriate base-level services/APIs filled out, would you have any additional objections? Put another way, why would a Mesos-based data center OS be superior to a PaaS-like solution?

    • A Mesos-bases solutions could support multiple PaaS-like frameworks all running multi-tenant. We’re already running Mesos clusters with DEIS, Marathon and Kubernetes. In what future will 100% of your workloads be in any one framework? The point of an OS, to some extent, is to make it infinitely extensible in this way.

      • Lachlan

        re-read the assumptions stated in my question.

        >>If the industry were to standardize on a particular PaaS

        highly unlikely but humor me – this is purely a hypothetical. assume there is a PaaS to end all PaaSes.

        as a developer, a PaaS-like solution seems more oriented to my needs and the way I want to define and consume a platform. and any good PaaS is extensible.

        what is that Mesos offers that would make me cheer “yeah, go Mesos!”? or is the lack of standardization the key motivation? or should we treat them as separate, necessary layers co-existing to address separate users and reqs?

        my original question was actually prompted by this isolated point in the article:

        >>PaaS… but is still designed first and foremost to be
        >>consumed by a human

        what’s so bad about that?

        • Lachlan

          actually, think I answered my own question here…

      • Kishorekumar Neelamegam

        I see emerging standards like CAMP (cloud application management platform, TOSCA (topology for orchestration of cloud apps)) trying to standardize PaaS. What is your take on this ?

        I think this isn’t answered.
        “If the industry were to standardize on a particular PaaS, with all the appropriate base-level services/APIs filled out, would you have any additional objections? Put another way, why would a Mesos-based data center OS be superior to a PaaS-like solution?”

        I only see the discussion going after Grid computing ? I don’t now how a developer can use it easy like a PaaS.

  • Mati

    Really nice article. What about solutions as Nutanix and his NOS? Doesn’t it already answer quite a lot of the needs you highlighted? And ok, it’ll great to be software agnostic but, for example, how to handle switch on a led on a faulty disk in a NetApp, hp blade, SM etc… at the same time?

  • I would humbly suggest that the ideal level of abstraction should be automatically determined by the computer. At this point in time, I suspect that Google, Microsoft and AWS, either individually or perhaps altogether, have enough data about successful and failed distributed implementations, to build an artificially intelligent OS for distributed computing (AIOSD).

    I am now working on a website that is hosted within a virtual machine on one of the physical servers owned and maintained by my client. If this virtual environment was in fact an AIOSD based interface, there would be a small icon at the top right of the screen, that via a flyout, would allow me to specify all of the desired characteristics of my web application.

    Defaults for all of the options would appear, along with a calculated price at the bottom of the fly out. So, I would indicate that I want peak performance to be able to handle 1 million simultaneous users. Based on an analysis of my website and historical data about every distributed application ever built, AIOSD would automatically create the necessary clusters, dedicate sufficient memory, set aside the right amount and type of disk storage and of course would automatically monitor all of this to see if there were signs of trouble. And most likely, if there were signs of trouble, the necessary changes would be made without bothering the user.

    This is in some ways equivalent to background garbage collection that is not even accessible to the developer. While the developer could make changes in the defaults, it would likely be that no human could better predict resource use and need, than the AIOSD system.

    A developer would specify if an application was private or public. If public, then anyone could reach the application via the appropriate address and begin working. Once the developer has finished testing the application on his or her “local” VM, then a one click deploy command would take care of the rest. And if overnight, this app would reach 100 million users, the developer would not have to worry about any difficulties.

    AIOSD would come with a huge API that effectively covered every single functionality that a developer could need. Email, SMS, chat, image storage, links to every social media and so on, would be available via AIOSD. By using the AIOSD version of these functionalities, the developer would know that they would scale without problem. So if the developer wants an email to be sent back to the user after initial login, the developer would not have to worry that the million users who connected overnight, would have any difficulties. AIOSD would automatically handle 1 million or 100 million simultaneous emails going out to every user.

    The truth is, that developers really should already be working with totally abstracted interfaces. While there are systems like this, Visual Studio for example still requires you to place buttons on the screen and right server-side code. Instead, the default interface should already be a Lego style development environment. How many versions of a login page do we need? Once again in the Visual Studio environment, there should be a login page Creator that includes every possible functionality that developers have ever used on a login page. And this login page should already be designed for compatibility with a distributed system like Azure.

    I guess the best analogy I can use is old and faithful Microsoft Access. In five minutes, a person familiar with the system can build a multiform database application that is effective. Take this ACCESS application, upload the tables to SQL Server and then link them back in to the front-end, and you have a classic client/server environment. It should be possible via MS Access to click a button and have the entire environment hosted on a virtual machine in the cloud. The developer could then tell the thousands or tens of thousands of employees in the company that they can connect to the application and use it.

    I truly hope we will get to this situation sooner rather than later. The amount of time being wasted on re-creating previously built solutions for distributed computing is horrifically huge. We have to finally standardize on certain defaults, all nearly hidden by a AIOSD style OS. The barriers to programming distributed systems will fall if we follow this path. And then, the distance between idea and real, functioning applications will be drastically reduced. Imagine building the next Facebook in an afternoon, never having to worry about anything except for your design.

    Sorry this was a long. Thanks for listening

  • John

    OpenVMS clusters do most of what’s discussed here. Languages for VMS conform to an OpenVMS calling standard, which means that routines are easily available. Shared disks and the use of logical names means that applications can be run on any machine in the cluster. Intracluster communications methods allow programs to talk to each other. Clusters can be expanded without downtime.

  • This is the kind of “one size fits all” thinking that’s been debunked over and over. Folks building new technologies always like to believe that their technology solves all problems. Reality is much more nuanced than this. There is absolutely a place for this kind of application orchestration / cluster management. In fact, I think this area has been woefully ignored.

    But that doesn’t necessarily mean that machines as abstractions need to go away. There are workloads that need the machine as an abstraction (HPC and Big Data come to mind). At the very least there are workloads that must understand the performance characteristics of the abstraction they are deployed into.

    One problem I see with this positioning, and let’s be honest that’s what it is, is that it ignores some of the technical realities. Folks who manage containers some times like to pretend that the container is an abstraction that removes all need to understand underlying hardware, but it doesn’t. For example, if I move a container from a server with 10G of networking to one with 1G of networking, I immediately face a potential bottleneck. Servers, storage, and networking can be abstracted, but they can’t be ignored. All servers are not equal. All storage is not equal. All networking is not equal. Someone, somewhere, has to pay attention to these inequalities and handle it.

    Traditionally, this is handled with a QoS system that manages everything down to a lowest common denominator. The problem with such systems is that they frequently leave underutilized resources. In this manner, managing machines as abstractions, while fundamentally more complicated, provides greater flexibility and much more granularity over resource allocation.

    This is really a “right tool for the job” type situation, where application containers as abstraction is right in one situation and machines as abstraction is right in another.

    There is no spoon.

  • maymounkov

    Dear Benjamin, your article describes nearly word for word the software http://gocircuit.org You will find much of the insight of your article and more in the following original white papers for the circuit: http://gocircuit-org.appspot.com/scalefree.html on scale-free engineering.

  • Ivo Vachkov

    Plan9 anyone?! :)

  • Hermann ‘supermatrix’ Djoumess

    Really Interresting…

  • clb

    An API (application programming interface) for the datacenter, would it not be a DOPI (devops programming interface). Come on, we do need some FLAs to replace those old TLAs.