Aker presented an OpenStack MySQL service developed by his current
employer, Hewlett-Packard. His keynote retold the story that had led
over the years to his developing Drizzle (a true fork of MySQL
that tries to return it to its lightweight, Web-friendly roots) and
eventually working on cloud computing for HP. He described modularity,
effective use of multiple cores, and cloud deployment as the future of
on the second day of the conference brought together high-level
managers from many of the companies that have entered the MySQL space
from a variety of directions in a high-level discussion of the
database engine’s future. Like most panels, the conversation ranged
over a variety of topics–NoSQL, modular architecture, cloud
computing–but hit some depth only on the topic of security, which was
not represented very strongly at the conference and was discussed here
at the insistence of Slavik Markovich from McAfee.
Keynote by Brian Aker.
Many of the conference sessions disappointed me, being either very
high level (although presumably useful to people who are really new to
various topics, such as Hadoop or flash memory) or unvarnished
marketing pitches. I may have judged the latter too harshly though,
because a decent number of attendees came, and stayed to the end, and
crowded around the speakers for information.
Two talks, though, were so fast-paced and loaded with detail that I
couldn’t possibly keep my typing up with the speaker.
One such talk was the keynote
by Mark Callaghan of Facebook. (Like the other keynotes, it should be
posted online soon.) A smattering of points from it:
Percona and MariaDB are adding critical features that make replication
and InnoDB work better.
When a logical backup runs, it is responsible for 50% of IOPS.
Defragmenting InnoDB improves compression.
Resharding is not worthwhile for a large, busy site (an insight also
discovered by Pinterest, as I reported earlier)
The other fact-filled talk was by
Yoshinori Matsunobu of Facebook, and concerned how to achieve
NoSQL-like speeds while sticking with MySQL and InnoDB. Much of the
talk discussed an InnoDB memcached plugin, which unfortunately is
still in the “lab” or “pre-alpha” stage. But he also suggested some
other ways to better performance, some involving Memcache and others
Coding directly with the storage engine API, which is storage-engine
Using HandlerSocket, which queues write requests and performs them
through a single thread, avoiding costly fsync() calls. This can
achieve 30,000 writes per second, robustly.
Matsunobu claimed that many optimizations are available within MySQL
because a lot of data can fit in main memory. For instance, if you
have 10 million users and store 400 bytes per user, the entire user
table can fit in 20 GB. Matsunobu tests have shown that most CPU time
in MySQL is spent in functions that are not essential for processing
data, such as opening and closing a table. Each statement opens a
separate connection, which in turn requires opening and closing the
table again. Furthermore, a lot of data is sent over the wire besides
the specific fields requested by the client. The solutions in the talk
evade all this overhead.
The commercial ecosystem
Both as vendors and as sponsors, a number of companies have always
lent another dimension to the MySQL conference. Some of these really
have nothing to do with MySQL, but offer drop-in replacements for it.
Others really find a niche for MySQL users. Here are a few that I
happened to talk to:
Clustrix provides a very
different architecture for relational data. They handle sharding
automatically, permitting such success stories as the massive scaling
up of the social media site Massive Media NV without extra
administrative work. Clustrix also claims to be more efficient by
breaking queries into fragments (such as the WHERE clauses of joins)
and executing them on different nodes, passing around only the data
produced by each clause.
Akiban also offers faster
execution through a radically different organization of data. They
flatten the normalized tables of a normalized database into a single
data structure: for instance, a customer and his orders may be located
sequentially in memory. This seems to me an import of the document
store model into the relational model. Creating, in effect, an object
that maps pretty closely to the objects used in the application
program, Akiban allows common queries to be executed very quickly, and
could be deployed as an adjunct to a MySQL database.
Tokutek produced a drop-in
replacement for InnoDB. The founders developed a new data structure
called a fractal tree as a faster alternative to the B-tree structures
normally used for indexes. The existence of Tokutek vindicates both
the open source distribution of MySQL and its unique modular design,
because these allowed Tokutek’s founders to do what they do
best–create a new storage engine–without needing to create a whole
database engine with the related tools and interfaces it would
Nimbus Data Systems creates a
flash-based hardware appliance that can serve as a NAS or SAN to
support MySQL. They support a large number of standard data transfer
protocols, such as InfiniBand, and provide such optimizations as
caching writes in DRAM and making sure they write complete 64KB blocks
to flash, thus speeding up transfers as well as preserving the life of
A low-key developer’s day followed Percona Live on Friday. I talked to
people in the Drizzle and
As a relatively young project, the Drizzle talks were aimed mostly at
developers interested in contributing. I heard talks about their
kewpie test framework and about build and release conventions. But in
keeping with it’s goal to make database use easy and light-weight, the
project has added some cool features.
Thanks to a
interface and a built-in web server, Drizzle now presents you with a
Web interface for entering SQL commands. The Web interface translates
Drizzle’s output to simple HTML tables for display, but you can also
capture the JSON directly, making programmatic access to Drizzle
easier. A developer explained to me that you can also store JSON
directly in Drizzle; it is simply stored as a single text column and
the JSON fields can be queried directly. This reminded me of an XQuery
interface added to some database years ago. There too, the XML was
simply stored as a text field and a new interface was added to run the
Sphinx, in contrast to Drizzle, is a mature product with commercial
support and (as mentioned earlier in the article) production
deployments at places such as craigslist, as well as an O’Reilly
book. I understood better, after attending today’s sessions, what
makes Sphinx appealing. Its quality is unusually high, due to the use
of sophisticated ranking algorithms from the research literature. The
team is looking at recent research to incorporate even better
algorithms. It is also fast and scales well. Finally, integration with
MySQL is very clean, so it’s easy to issue queries to Sphinx and pick
Recent enhancements include an add-on called fSphinx
to make faceted searches faster (through caching) and easier, and
access to Bayesian Sets to find “items similar to this one.” In Sphinx
itself, the team is working to add high availability, include a new
morphology (stemming, etc.) engine that handles German, improve
compression, and make other enhancements.
The day ended with a reception and copious glasses of Monty Widenius’s
notorious licorice-flavored vodka, an ending that distinguishes the
MySQL conference from others for all time.