OSCON day 2: Prophet, your path out of the cloud

Some of you may know Jesse Vincent as the guy who hands out snarky t-shirts like last year’s “My free software runs your business” shirt. But today I got to see Jesse’s more serious side when I attended his “Prophet, your path out of the cloud” presentation. He started his session by outlining why cloud computing may not be the best idea and then went on to talk about his new distributed database called Prophet.

Since I’ve been pondering hosting MusicBrainz’ web services at EC2, I found his analogy of cloud computing as “digital sharecropping” quite apt. Wikipedia defines sharecropping as: “Sharecropping is a system of agriculture or agricultural production in which a landowner allows a tenant to use the land in return for a share of the crop produced on the land (e.g., 50 percent of the crop).” History tells us that sharecropping didn’t work out so well for the farmers and that a lot of the farmers were dependent on the landowners and heavily in debt to them.

In the beginning of computing people ran programs they didn’t own on machines they didn’t own (mainframes were leased from the manufacturer). People had no control over when these machines got updates and had very little control in general. In the 80’s things got better as PCs started appearing, only to lock users into things like Windows. And today people don’t need to have servers, software or anything else — just a web browser to host and run web-sites thanks to cloud computing.

So, what happens when they go down? Your web-site or perhaps your business stops dead in its tracks as we saw last week when Amazon’s S3 service went kaputt. Also, how do you trust your service provider to not send a copy of all of your email to the Chinese secret police? What if you get shut out because your service provider disagrees with what you are doing?

In short, if your provider shuts you out, you’re screwed! Then what do you do when you’re work is tied to that provider (e.g. EC2 AMIs)? Jesse’s message is that if you use hosted applications, you’re going to get burned. Maybe not today, but at some point it will catch up with you! This thought provided him with motivation to work on a distributed database that needs no hosted servers and can live with data being synchronized via sneaker-net.

Jesse describes Prophet as a grounded database, since it runs on the edge of the network — not in the cloud! It syncs with services that you already use (mostly bug tracking systems since Jesse is the father of RT). Replicated services are called Foreign Replicas.

Prophet is semi-relational; it is possible to do relational joins, but they are expensive. Prophet follows a similar model to Amazon’s Simple DB to keep things nice and simple. But the most significant feature of Prophet is that it is Peer-to-Peer distributed: You can update data in any of its replicated copies and later on re-sync the databases and automatically resolve conflicts between two copies of the data. You can pull from a replica or push to any replica and the changes will propagate properly. Changes can be pushed via rsync or a standard filesystem and pulled via HTTP.

Prophet is also disconnected, which allows replicas to live disconnected from other databases for prolonged periods of time. Not all databases will have constant network connections so Prophet handles this case so that replicas can live on USB keys for instance. Prophet also features versions much like a version control system — each change to the database has a version number. The entire history of the database is introspectable and replication simply replays changesets into replicas. Operations like create, read, update, delete and search are all atomic.

The most important aspect of Prophet is its ability to semi-automatically resolve conflicts that arise when replicas are sync’ed with other replicas. Since changes can be applied to any replica, it is quite possible for two or more replicas to make changes to the same records, resulting in conflicts during the next sync. Prophet handles this via a voting mechanism where all replicas will get a chance to vote on conflicts. To make this possible, Prophet keeps a history of conflicts and outcomes of conflict resolutions.

Near the end of the talk Jesse reminded everyone that Prophet is very young still — it was written in the last couple of months. The Perl codebase is fairly small, contains a lot of test cases and all the core functions of the database are working correctly. However, search is currently “half-assed”; several people have promised an index implementation for improved searching. If you’re interested in playing with Prophet and perhaps even jump in and help Jesse out, download a copy and start playing!

I haven’t had a chance to play with Prophet yet, but I’m excited that Jesse took the time to write this tool. Distributed databases with automatic conflict resolution are hard, which is probably the reason why we haven’t seen an established database like this yet. Even though it seems that just about everything is being connected to the net, I can see real value in Prophet and the types of disconnected/decentralized applications that it can enable.

Thanks for Prophet and thanks for the presentation Jesse!