OSCON day 2: Prophet, your path out of the cloud

Some of you may know Jesse Vincent as the guy who hands out snarky t-shirts like last year’s “My free software runs your business” shirt. But today I got to see Jesse’s more serious side when I attended his “Prophet, your path out of the cloud” presentation. He started his session by outlining why cloud computing may not be the best idea and then went on to talk about his new distributed database called Prophet.

Since I’ve been pondering hosting MusicBrainz’ web services at EC2, I found his analogy of cloud computing as “digital sharecropping” quite apt. Wikipedia defines sharecropping as: “Sharecropping is a system of agriculture or agricultural production in which a landowner allows a tenant to use the land in return for a share of the crop produced on the land (e.g., 50 percent of the crop).” History tells us that sharecropping didn’t work out so well for the farmers and that a lot of the farmers were dependent on the landowners and heavily in debt to them.

In the beginning of computing people ran programs they didn’t own on machines they didn’t own (mainframes were leased from the manufacturer). People had no control over when these machines got updates and had very little control in general. In the 80’s things got better as PCs started appearing, only to lock users into things like Windows. And today people don’t need to have servers, software or anything else — just a web browser to host and run web-sites thanks to cloud computing.

So, what happens when they go down? Your web-site or perhaps your business stops dead in its tracks as we saw last week when Amazon’s S3 service went kaputt. Also, how do you trust your service provider to not send a copy of all of your email to the Chinese secret police? What if you get shut out because your service provider disagrees with what you are doing?

In short, if your provider shuts you out, you’re screwed! Then what do you do when you’re work is tied to that provider (e.g. EC2 AMIs)? Jesse’s message is that if you use hosted applications, you’re going to get burned. Maybe not today, but at some point it will catch up with you! This thought provided him with motivation to work on a distributed database that needs no hosted servers and can live with data being synchronized via sneaker-net.

Jesse describes Prophet as a grounded database, since it runs on the edge of the network — not in the cloud! It syncs with services that you already use (mostly bug tracking systems since Jesse is the father of RT). Replicated services are called Foreign Replicas.

Prophet is semi-relational; it is possible to do relational joins, but they are expensive. Prophet follows a similar model to Amazon’s Simple DB to keep things nice and simple. But the most significant feature of Prophet is that it is Peer-to-Peer distributed: You can update data in any of its replicated copies and later on re-sync the databases and automatically resolve conflicts between two copies of the data. You can pull from a replica or push to any replica and the changes will propagate properly. Changes can be pushed via rsync or a standard filesystem and pulled via HTTP.

Prophet is also disconnected, which allows replicas to live disconnected from other databases for prolonged periods of time. Not all databases will have constant network connections so Prophet handles this case so that replicas can live on USB keys for instance. Prophet also features versions much like a version control system — each change to the database has a version number. The entire history of the database is introspectable and replication simply replays changesets into replicas. Operations like create, read, update, delete and search are all atomic.

The most important aspect of Prophet is its ability to semi-automatically resolve conflicts that arise when replicas are sync’ed with other replicas. Since changes can be applied to any replica, it is quite possible for two or more replicas to make changes to the same records, resulting in conflicts during the next sync. Prophet handles this via a voting mechanism where all replicas will get a chance to vote on conflicts. To make this possible, Prophet keeps a history of conflicts and outcomes of conflict resolutions.

Near the end of the talk Jesse reminded everyone that Prophet is very young still — it was written in the last couple of months. The Perl codebase is fairly small, contains a lot of test cases and all the core functions of the database are working correctly. However, search is currently “half-assed”; several people have promised an index implementation for improved searching. If you’re interested in playing with Prophet and perhaps even jump in and help Jesse out, download a copy and start playing!

I haven’t had a chance to play with Prophet yet, but I’m excited that Jesse took the time to write this tool. Distributed databases with automatic conflict resolution are hard, which is probably the reason why we haven’t seen an established database like this yet. Even though it seems that just about everything is being connected to the net, I can see real value in Prophet and the types of disconnected/decentralized applications that it can enable.

Thanks for Prophet and thanks for the presentation Jesse!

  • For some reason this got me thinking of botnets. Be interesting to see what comes of it.

  • I think this looks like a really interesting project, especially since I am a large proponent of a more “ad hoc” distributed linked data web, a web that is inherently in itself the OS — not some big web mainframe at Big Data. In one of my earlier blog posts, I take a look at the implications and coming shifts of how and why large centralized repositories of data are bad:

    projects like this can help further the mentality that we dont need one big data bucket in the sky, that we can have distributed storage that is interwoven into the web all around us. This type of web can be more “distance vector” as opposed to “link state”, can be auto discoverable at runtime, and be that set of loosely coupled pieces that Oreilly has mentioned before.

    Good post, I definitely will have to take closer look at this project.

  • The analogy to sharecropping seems based in hyberpole — great for getting people riled up at your presentation but it breaks down pretty quickly when you examine it. Try replacing a web store with a retail store and the web service provider with a landlord and I think you have a better model:

    * Landlords charge a flat rate by square footage (usage), not a percentage of your sales (like sharecroppers). (Probably a few exceptions, like fast food in airports)
    * Purchasing and configuring land and servers is an expensive initial cost
    * There are tax advantages to not owning property (land or servers)

    But like a web service provider you may have the same concerns:
    * How can you trust your landlord not to let the secret police in?
    * What if your landlord disagrees with what you are doing and asks you to move out?
    * If your landlord shuts you out (by changing the locks), you’re screwed!

    Despite the last three issues there’s millions of working landlord and commercial tenant relationships out there that are more or less working (landlord reluctance to put in a good HVAC system aside).

  • Anybody interested in Prophet may also want to check out Couch DB, a great distributed and concurrent database that’s been in the works for a few years. I’d love to see a comparison of the two projects, and I’m sure they each have something to learn from the other.

  • I think Jonathan nails it with his Landlord analogy – much closer to reality than the sharecropping one.

    I can’t help but feel this sounds like more FUD related to Cloud Computing. Even McKinsey were doing some Cloud bashing in a recent report – I link to it here: http://inchpebbles.blogspot.com/2009/04/follow-up-to-mckinsey-fud-on-cloud.html