Database War Stories #8: Findory and Amazon

Once Greg Linden had pinged me about BigTable (leading to yesterday’s entry), it occurred to me to ask Greg for his own war stories, both at Findory and Amazon. We hear a recurrent theme: the use of flat files and custom file systems. Despite the furor that ensued in the comments when Mark Fletcher and Gabe Rivera noted that they didn’t use traditional SQL databases, Web 2.0 applications really do seem to have different demands, especially once they get to scale. But history shows us that custom, home-grown solutions created by alpha geeks point the way to new entrepreneurial opportunities…Greg wrote:

There are a couple stories I could tell, one about the early systems at Amazon, one about the database setup at Findory.

On Findory, our traffic and crawl is much smaller than sites like Bloglines, but, even at our size, the system needs to be carefully architected to be able to rapidly serve up fully personalized pages for each user that change immediately after each new article is read.

Our read-only databases are flat files — Berkeley DB to be specific — and are replicated out using our own replication management tools to our webservers. This strategy gives us extremely fast access from the local filesystem. We make thousands of random accesses to this read-only data on each page serve; Berkeley DB offers the performance necessary to be able to still serve our personalized pages rapidly under this load.

Our much smaller read-write data set, which includes information like each user’s reading history, is stored in MySQL. MySQL MyISAM works very well for this type of non-critical data since speed is the primary concern and more sophisticated transactional support is not important. While it has not been necessary yet, our intention is to scale our MySQL database with horizontal partitioning, though we may also experiment with MySQL Cluster as well.

For both read-only and read-write data, we have been careful to keep the data formats compact to ensure that the total active data set can easily fit in main memory. We attempt to avoid as much disk access as we can.

After all of this, Findory is able to serve fully personalized pages, different for each reader, that change immediately when each person reads a new article, all in well under 100ms. People don’t like to wait. We believe getting what people need quickly and reliably is an important part of the user experience.

[Regarding Amazon], as it turns out, I have already written up a fairly detailed description of Amazon’s early (1997) systems on my weblog. An excerpt from that post:

[We] were going to take Amazon from one webserver to multiple webservers.

There would be two staging servers, development and master, and then a fleet of online webservers. The staging servers were largely designed for backward compatibility. Developers would share data with development when creating new website features. Customer service, QA, and tools would share data with master. This had the added advantage of making master a last wall of defense where new code and data would be tested before it hit online.

Read-only data would be pushed out through this pipeline. Logs would be pulled off the online servers. For backward compatibility with log processing tools, logs would be merged so they looked like they came from one webserver and then put on a fileserver.

Stepping out for a second, this is a point where we really would have liked to have a robust, clustered, replicated, distributed file system. That would have been perfect for read-only data used by the webservers.

NFS isn’t even close to this. It isn’t clustered or replicated. It freezes all clients when the NFS server goes down. Ugly. Options that are closer to what we wanted, like CODA, were (and still are) in the research stage.

Without a reliable distributed file system, we were down to manually giving each webserver a local copy of the read-only data.

More entries in the database war stories series: Second Life, Bloglines and Memeorandum, Flickr, NASA World Wind, Craigslist, O’Reilly Research, Google File System and BigTable, Brian Aker of MySQL Responds.

tags: