Open Source and Cloud Computing

I’ve been worried for some years that the open source movement might fall prey to the problem that Kim Stanley Robinson so incisively captured in Green Mars: “History is a wave that moves through time slightly faster than we do.” Innovators are left behind, as the world they’ve changed picks up on their ideas, runs with them, and takes them in unexpected directions.

In essays like The Open Source Paradigm Shift and What is Web 2.0?, I argued that the success of the internet as a non-proprietary platform built largely on commodity open source software could lead to a new kind of proprietary lock-in in the cloud. What good are free and open source licenses, all based on the act of software distribution, when software is no longer distributed but merely performed on the global network stage? How can we preserve freedom to innovate when the competitive advantage of online players comes from massive databases created via user contribution, which literally get better the more people use them, raising seemingly insuperable barriers to new competition?

I was heartened by the program at this year’s Open Source Convention. Over the past couple of years, open source programs aimed at the Web 2.0 and cloud computing problem space have been proliferating, and I’m seeing clear signs that the values of open source are being reframed for the network era. Sessions like Beyond REST? Building Data Services with XMPP PubSub, Cloud Computing with BigData, Hypertable: An Open Source, High Performance, Scalable Database, Supporting the Open Web, and Processing Large Data with Hadoop and EC2 were all full. (Due to enforcement of fire regulations at the Portland Convention Center, many of them had people turned away, as SRO was not allowed. Brian Aker’s session on Drizzle was so popular that he gave it three times!)

But just “paying attention” to cloud computing isn’t the point. The point is to rediscover what makes open source tick, but in the new context. It’s important to recognize that open source has several key dimensions that contribute to its success:

Licenses that permit and encourage redistribution, modification, and even forking;
An architecture that enables programs to be used as components where-ever possible, and extended rather than replaced to provide new functionality;
Low barriers for new users to try the software;
Low barriers for developers to build new applications and share them with the world.

This is far from a complete list, but it gives food for thought. As outlined above, I don’t believe we’ve figured out what kinds of licenses will allow forking of Web 2.0 and cloud applications, especially because the lock-in provided by many of these applications is given by their data rather than their code. However, there are hopeful signs like Yahoo! Boss that companies are at beginning to understand that in the era of the cloud, open source without open data is only half the application.

But even open data is fundamentally challenged by the idea of utility computing in the cloud. Jesse Vincent, the guy who’s brought out some of the best hacker t-shirts ever (as well as RT) put it succinctly: “Web 2.0 is digital sharecropping.” (Googling, I discover that Nick Carr seems to have coined this meme back in 2006!) If this is true of many Web 2.0 success stories, it’s even more true of cloud computing as infrastructure. I’m ever mindful of Microsoft Windows Live VP Debra Chrapaty’s dictum that “In the future, being a developer on someone’s platform will mean being hosted on their infrastructure.” The New York Times dubbed bandwidth providers OPEC 2.0. How much more will that become true of cloud computing platforms?

That’s why I’m interested in peer-to-peer approaches to delivering internet applications. Jesse Vincent’s talk, Prophet: Your Path Out of the Cloud describes a system for federated sync; Evan Prodromou’s Open Source Microblogging describes identi.ca, a federated open source approach to lifestreaming applications.

We can talk all we like about open data and open services, but frankly, it’s important to realize just how much of what is possible is dictated by the architecture of the systems we use. Ask yourself, for example, why the PC wound up with an ecosystem of binary freeware, while Unix wound up with an ecosystem of open source software? It wasn’t just ideology; it was that the fragmented hardware architecture of Unix required source so users could compile the applications for their machine. Why did the WWW end up with hundreds of millions of independent information providers while centralized sites like AOL and MSN faltered?

Take note: All of the platform as a service plays, from Amazon’s S3 and EC2 and Google’s AppEngine to Salesforce’s force.com — not to mention Facebook’s social networking platform — have a lot more in common with AOL than they do with internet services as we’ve known them over the past decade and a half. Will we have to spend a decade backtracking from centralized approaches? The interoperable internet should be the platform, not any one vendor’s private preserve. (Neil McAllister provides a look at just how one-sided most platform as a service contracts are.)

So here’s my first piece of advice: if you care about open source for the cloud, build on services that are designed to be federated rather than centralized. Architecture trumps licensing any time.

But peer-to-peer architectures aren’t as important as open standards and protocols. If services are required to interoperate, competition is preserved. Despite all Microsoft and Netscape’s efforts to “own” the web during the browser wars, they failed because Apache held the line on open standards. This is why the Open Web Foundation, announced last week at OScon, is putting an important stake in the ground. It’s not just open source software for the web that we need, but open standards that will ensure that dominant players still have to play nice.

The “internet operating system” that I’m hoping to see evolve over the next few years will require developers to move away from thinking of their applications as endpoints, and more as re-usable components. For example, why does every application have to try to recreate its own social network? Shouldn’t social networking be a system service?

This isn’t just a “moral” appeal, but strategic advice. The first provider to build a reasonably open, re-usable system service in any particular area is going to get the biggest uptake. Right now, there’s a lot of focus on low level platform subsystems like storage and computation, but I continue to believe that many of the key subsystems in this evolving OS will be data subsystems, like identity, location, payment, product catalogs, music, etc. And eventually, these subsystems will need to be reasonably open and interoperable, so that a developer can build a data-intensive application without having to own all the data his application requires. This is what John Musser calls the programmable web.

Note that I said “reasonably open.” Google Maps isn’t open source by any means, but it was open enough (considerably more so than any preceding web mapping service) and so it became a key component of a whole generation of new applications that no longer needed to do their own mapping. A quick look at programmableweb.com shows google maps with about 90% share of mapping mashups. Google Maps is proprietary, but it is reusable. A key test of whether an API is open is whether it is used to enable services that are not hosted by the API provider, and are distributed across the web. Facebook’s APIs enable applications on Facebook; Google Maps is a true programmable web subsystem.

That being said, even though the cloud platforms themselves are mostly proprietary, the software stacks running on them are not. Thorsten von Eicken of Rightscale pointed out in his talk Scale Into the Cloud, that almost all of the software stacks running on cloud computing platforms are open source, for the simple reason that proprietary software licenses have no provisions for cloud deployment. Even though open source licenses don’t prevent lock-in by cloud providers, they do at least allow developers to deploy their work on the cloud.

In that context, it’s important to recognize that even proprietary cloud computing provides one of the key benefits of open source: low barriers to entry. Derek Gottfried’s Processing Large Data with Hadoop and EC2 talk was especially sweet in demonstrating this point. Derek described how, armed with a credit card, a sliver of permission, and his hacking skills, he was able to put the NY Times historical archive online for free access, ramping up from 4 instances to nearly 1,000. Open source is about enabling innovation and re-use, and at their best, Web 2.0 and cloud computing can be bent to serve those same aims.

Yet another benefit of open source – try before you buy viral marketing – is also possible for cloud application vendors. During one venture pitch, I was asking the company how they’d avoid the high sales costs typically associated with enterprise software. Open source has solved this problem by letting companies build a huge pipeline of free users, who they can then upsell with follow-on services. The cloud answer isn’t quite as good, but at least there’s an answer: some number of application instances are free, and you charge after that. While this business model loses some virality, and transfers some costs from the end user to the application provider, it has a benefit that open source now lacks, of providing a much stronger upgrade path to paid services. Only time will tell whether open source or cloud deployment is a better distribution vector, but it’s clear that both are miles ahead of traditional proprietary software in this regard.

In short, we’re a long way from having all the answers, but we’re getting there. Despite all the possibilities for lock-in that we see with Web 2.0 and cloud computing, I believe that the benefits of openness and interoperability will eventually prevail, and we’ll see a system made up of cooperating programs that aren’t all owned by the same company, an internet platform, that, like Linux on the commodity PC architecture, is assembled from the work of thousands. Those who are skeptical of the idea of the internet operating system argue that we’re missing the kinds of control layers that characterize a true operating system. I like to remind them that much of the software that is today assembled into a Linux system already existed before Linus wrote the kernel. Like LA, 72 suburbs in search of a city, today’s web is 72 subsystems in search of an operating system kernel. When we finally get that kernel, it had better be open source.