Werner Vogels just posted the full text of “Dynamo: Amazon’s Highly Available Key-Value Store”, which he and his team will be presenting at the ACM Symposium on Operating Systems Principles next week. While this is not (and Werner emphasizes won’t become) a new web service, it’s an excellent read for anyone thinking about scalable web sites.
Werner says: “We submitted the technology for publication in SOSP because many of the techniques used in Dynamo originate in the operating systems and distributed systems research of the past years; DHTs, consistent hashing, versioning, vector clocks, quorum, anti-entropy based recovery, etc. As far as I know Dynamo is the first production system to use the synthesis of all these techniques, and there are quite a few lessons learned from doing so. The paper is mainly about these lessons.”
The operational challenges and solutions presented in the paper are particularly interesting…
One of the lessons our organization has learned from operating Amazon’s platform is that the reliability and scalability of a system is dependent on how its application state is managed. Amazon uses a highly decentralized, loosely coupled, service oriented architecture consisting of hundreds of services. In this environment there is a particular need for storage technologies that are always available. For example, customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados. Therefore, the service responsible for managing shopping carts requires that it can always write to and read from its data store, and that its data needs to be available across multiple data centers.