Database War Stories #9 (finis): Brian Aker of MySQL Responds

Brian Aker of MySQL sent me a few email comments about this whole “war stories” thread, which I reproduce here. Highlight — he says: “Reading through the comments you got on your blog entry, these users are hitting on the same design patterns. There are very common design patterns for how to scale a database, and few sites really turn out to be all that original. Everyone arrives at certain truths, flat files with multiple dimensions don’t scale, you will need to partition your data in some manner, and in the end caching is a requirement.”

I agree about the common design patterns, but I didn’t hear that flat files don’t scale. What I heard is that some very big sites are saying that traditional databases don’t scale, and that the evolution isn’t from flat files to SQL databases, but from flat files to sophisticated custom file systems. Brian acknowledges that SQL vendors haven’t solved the problem, but doesn’t seem to think that anyone else has either.

Here are Brian’s comments in full:

While at the conference I spoke to an outfit who had stuck around a terabyte of data into just one table. The table had tiny little rows, and the primary key was not native to the database, aka they derived it from an external application and it was not really database friendly. They were looking for a solution to the table problem when in reality they needed a solution to their usage problem.

Predictably the solution was to partition the database with one master database for lookups to find out where the actual database holding the real data was. AKA I suggested that they partition their data, and as is often the case their data partitioned quite easily. This is the sort of use case I see over and over again. There is a talk I’ve been giving for years on how people lay out their database environment, its been interesting to watch what the converging use cases are, and every time I give the talk I find new insights on how people are creating clusters/creating scale out.

Reading through the comments you got on your blog entry, these users are hitting on the same design patterns. There are very common design patterns for how to scale a database, and few sites really turn out to be all that original. Everyone arrives at certain truths, flat files with multiple dimensions don’t scale, you will need to partition your data in some manner, and in the end caching is a requirement.

Its also obvious that no one has fulltext done in a manner which is really right yet. The Lucene approach is “shove it all in, hope you can find it” method, which is no better then google with different weighting. Contextual relational environments are needed, but I am not seeing any SQL yet that make me think the database vendors have the solved the problem. The technology is there, but no one has found the common language that is required to make this work just yet.

More entries in the database war stories series: Second Life, Bloglines and Memeorandum, Flickr, NASA World Wind, Craigslist, O’Reilly Research, Google File System and BigTable, Findory and Amazon.