Database War Stories #9 (finis): Brian Aker of MySQL Responds

Brian Aker of MySQL sent me a few email comments about this whole “war stories” thread, which I reproduce here. Highlight — he says: “Reading through the comments you got on your blog entry, these users are hitting on the same design patterns. There are very common design patterns for how to scale a database, and few sites really turn out to be all that original. Everyone arrives at certain truths, flat files with multiple dimensions don’t scale, you will need to partition your data in some manner, and in the end caching is a requirement.”

I agree about the common design patterns, but I didn’t hear that flat files don’t scale. What I heard is that some very big sites are saying that traditional databases don’t scale, and that the evolution isn’t from flat files to SQL databases, but from flat files to sophisticated custom file systems. Brian acknowledges that SQL vendors haven’t solved the problem, but doesn’t seem to think that anyone else has either.

Here are Brian’s comments in full:

While at the conference I spoke to an outfit who had stuck around a terabyte of data into just one table. The table had tiny little rows, and the primary key was not native to the database, aka they derived it from an external application and it was not really database friendly. They were looking for a solution to the table problem when in reality they needed a solution to their usage problem.

Predictably the solution was to partition the database with one master database for lookups to find out where the actual database holding the real data was. AKA I suggested that they partition their data, and as is often the case their data partitioned quite easily. This is the sort of use case I see over and over again. There is a talk I’ve been giving for years on how people lay out their database environment, its been interesting to watch what the converging use cases are, and every time I give the talk I find new insights on how people are creating clusters/creating scale out.

Reading through the comments you got on your blog entry, these users are hitting on the same design patterns. There are very common design patterns for how to scale a database, and few sites really turn out to be all that original. Everyone arrives at certain truths, flat files with multiple dimensions don’t scale, you will need to partition your data in some manner, and in the end caching is a requirement.

Its also obvious that no one has fulltext done in a manner which is really right yet. The Lucene approach is “shove it all in, hope you can find it” method, which is no better then google with different weighting. Contextual relational environments are needed, but I am not seeing any SQL yet that make me think the database vendors have the solved the problem. The technology is there, but no one has found the common language that is required to make this work just yet.

More entries in the database war stories series: Second Life, Bloglines and Memeorandum, Flickr, NASA World Wind, Craigslist, O’Reilly Research, Google File System and BigTable, Findory and Amazon.

tags:
  • Jim

    He didn’t say that flat files don’t scale, he said flat files with multiple dimensions don’t scale. That’s a real and vital distinction, otherwise no one would use relational databases.

  • http://tim.oreilly.com Tim O'Reilly

    Jim — You’re right. But I was still surprised by how many people were going the custom file system route. It will be interesting to see how commercial databases evolve in response to the demands of web applications.

  • http://glinden.blogspot.com Greg Linden

    On dissatisfaction with current commercial databases and the need for custom solutions, it is interesting that Google appears to be building their own custom database, BigTable, to handle their needs.

  • http://www.postgresql.org Josh Berkus

    Tim, Brian,

    You may want to take a look at OpenFTS, PostgreSQL’s full text search. It’s considerably more sophisticated than MySQL FTI, but unlike Lucene is database-integrated. Not that it’s where it needs to be yet, but it’s developing rapidly (the team just committed Generic Inverted Indexes) and should it should be possible to do some really cool text tricks within a couple of versions.

  • http://www.postgresql.org Josh Berkus

    Tim,

    One more comment. RDBMSes are *general-purpose* tools. It’s always going to be possible to build a *narrow-focus* tool on custom code which is faster and more scalable than an RDBMS. For example, I wouldn’t suggest putting Google web search on an RDBMS. But, if you don’t know what you might be doing with your data in the future — a common situation — RDBMSes are irreplaceable.