Two themes emerged at this week’s MySQL conference: Mix your relational database with less formal solutions and move to the cloud. Naturally, the event included many other talks of a more immediate practical nature: data warehousing and business intelligence, performance (both in MySQL configuration and in the environment, which includes the changes caused by replacing disks with Flash), how to scale up, and new features in both MySQL and its children. But everyone seemed to agree that MySQL does not stand alone.
The world of databases have changed both in scale and in use. As Baron Schwartz said in his broad-vision keynote, databases are starting to need to handle petabytes. And he criticized open source database options as having poorer performance than proprietary ones. As for use, the databases struggle to meet two types of requirements: requests from business users for expeditious reports on new relationships, and data mining that traverses relatively unstructured data such as friend relationships, comments on web pages, and network traffic.
Some speakers introduced NoSQL with a bit of sarcasm, as if they had to provide an interface to Hbase or MongoDB as a check-off item. At the other extreme, in his keynote, Brian Aker summed up his philosophy about Drizzle by saying, “We are not an island unto ourselves, we are infrastructure.”
Judging from the informal audience polls, most of the audience had not explored NoSQL or the cloud yet. Most of the speakers about these technologies offered a mix of basic introductory material and useful practical information to meet the needs of their audience, who came, listened, and asked questions. I heard more give and take during the talks about traditional MySQL topics, because the audience seemed well versed in them.
The analysts and experts are telling us we can save money and improve scalability using EC2-style cloud solutions, and adopt new techniques to achieve the familiar goals of reliability and fast response time. I think a more subtle challenge of the cloud was barely mentioned: it encourages a kind of replication that fuzzes the notion of consistency and runs against the ideal of a unique instance for each item of data. Of course, everyone uses replication for production relational databases anyway, both to avert disaster and for load balancing, so the ideal has been blurred for a long time. As we explore the potential of cloud systems as content delivery networks, they blur the single-instance ideal. Sarah Novotny of Blue Gecko, while warning about the risks of replication, gave a talk about some practical considerations in making it work, such as tolerating inconsistency for data about sessions.
What about NoSQL solutions, which have co-existed with relational databases for decades? Everybody knows about key-value stores, and Memcache has always partnered with MySQL to serve data quickly. I had a talk with Roger Magoulas, an O’Reilly researcher, about some of the things you sacrifice if you use a NoSQL solution instead of a relational database and why that might be OK.
- Redundancy and consistency
Instead of storing an attribute such as “supplier” or “model number” in a separate table, most NoSQL solutions make it a part of the record for each individual member of the database. The increased disk space or RAM required becomes irrelevant in an age when those resources are so cheap and abundant. What’s more significant is that a programmer can store any supplier or model number she wants, instead of having to select from a fixed set enforced by foreign key constraints. This can introduce inconsistencies and errors, but practical database experts have known for a long time that perfect accuracy is a chimera (see the work of Jeff Jonas) and modern data analytics work around the noise. When you’re looking for statistical trends, whether in ocean samples or customer preferences, you don’t care whether 800 of your 8 million records have corrupt data in the field you’re aggregating.
- Decentralizing decisions and control
A relational database potentially gives the DBA most of the power: it is the DBA that creates the schema, defines stored procedures and triggers, and detects and repairs inconsistencies. Modern databases such as MySQL have already blurred the formerly rigid boundaries between DBA and application programmer. In the early days, the programmer had to do a lot of the work that was reserved for DBAs in traditional databases. NoSQL clearly undermines the control freaks even more. As I’ve already said, enforcing consistency is not as important nowadays as it once seemed, and modern programming languages offer other ways (such as enums and sets) to prevent errors.
I think this is still the big advantage of relational databases. Their complicated schemas and join semantics allow data mining and extended uses that evolve over the years. Many NoSQL databases are designed around the particular needs of an organization at a particular time, and require records to be ordered in the way you want to access them. And I think this is why, as discussed in some of the sessions at this conference, many people start with their raw data in some NoSQL data store and leave the results of their processing in a relational database.
The mood this week was business-like. I’ve attended conferences held by emerging communities that crackle with the excitement of building something new that no one can quite anticipate; the MySQL conference wasn’t like that. The attendees have a job to do and an ROI to prove; they wanted to learn whatever would help them do that.
But the relational model still reflects most of the data handling needs we have, so MySQL will stick around. This may actually be the best environment it has ever enjoyed. Oracle still devotes a crack programming team (which includes several O’Reilly authors) to meeting corporate needs through performance improvements and simple tools. Monty Program has forked off MariaDB and Percona has popularized XtraDB, all the while contributing new features under the GPL that any implementation can use. Drizzle strips MySQL down to a core while making old goals such as multi-master replication feasible. A host of companies in layered applications such as business intelligence cluster around MySQL.
MySQL spawns its own alternative access modes, such as the HandlerSocket plugin that returns data quickly to simple queries while leaving the full relational power of the database in place for other uses. And vendors continue to find intriguing alternatives such as the Xeround cloud service that automates fail-over, scaling, and sharding while preserving MySQL semantics. I don’t think any DBA’s skills will become obsolete.