MySQL in 2012: Report from Percona Live

Checking in on the state of MySQL.

The big annual MySQL conference, started by MySQL AB in 2003 and run
by my company O’Reilly for several years, lives on under the able
management of Percona. This
fast-growing company started out doing consulting on MySQL,
particularly in the area of performance, and branched out into
development and many other activities. The principals of this company
wrote the most recent two editions of the popular O’Reilly book High
Performance MySQL
.

Percona started offering conferences a couple years ago and decided to
step in when O’Reilly decided not to run the annual MySQL conference
any more. Oracle did not participate in Percona Live, but has
announced its own MySQL
conference
for next September.

Percona Live struck me as a success, with about one thousand attendees
and the participation of leading experts from all over the MySQL
world, save for Oracle itself. The big players in the MySQL user
community came out in force: Facebook, HP, Google, Pinterest (the
current darling of the financial crowd), and so on.

The conference followed the pattern laid down by old ones in just
about every way, with the same venue (the Santa Clara Convention
Center, which is near a light-rail but nothing else of interest), the
same food (scrumptious), the standard format of one day of tutorials
and two days of sessions (although with an extra developer day tacked
on, which I will describe later), an expo hall (smaller than before,
but with key participants in the ecosystem), and even community awards
(O’Reilly Media won an award as Corporate Contributor of the Year).
Monty Widenius was back as always with a MariaDB entourage, so it
seemed like old times. The keynotes seemed less well attended than the
ones from previous conferences, but the crowd was persistent and
showed up in impressive numbers for the final events–and I don’t
believe it was because everybody thought they might win one of the
door prizes.

Jeremy Zawodny ready to hand out awards
Jeremy Zawodny ready to hand out awards.

Two contrasting database deployments

I checked out two well-attended talks by system architects from two

high-traffic sites: Pinterest and craigslist. The radically divergent
paths they took illustrate the range of options open to data centers
nowadays–and the importance of studying these options so a data
center can choose the path appropriate to its mission and
applications.

Jeremy Zawodny (co-author of the first edition of High Performance
MySQL
) presented
the design of craigslist’s site
, which illustrates the model of
software accretion over time and an eager embrace of heterogeneity.
Among their components are:

  • Memcache, lying between the web servers and the MySQL database in
    classic fashion.

  • MySQL to serve live postings, handle abuse, data for monitoring
    system, and other immediate needs.

  • MongoDB to store almost 3 billion items related to archived (no longer
    live) postings.

  • HAproxy to direct requests to the proper MySQL server in a cluster.

  • Sphinx for text searches, with

    indexes over all live postings, archived postings, and forums.

  • Redis for temporary items such as counters and blobs.

  • An XFS filesystem for images.

  • Other helper functions that Zawodny lumped together as “async
    services.”

Care and feeding of this menagerie becomes a job all in itself.
Although craigslist hires enough developers to assign them to
different areas of expertise, they have also built an object layer
that understands MySQL, cache, Sphinx, MongoDB. The original purpose
of this layer was to aid in migrating old data from MySQL to MongoDB
(a procedure Zawodny admitted was painful and time-consuming) but it
was extended into a useful framework that most developers can use
every day.

Zawodny praised MySQL’s durability and its method of replication. But
he admitted that they used MySQL also because it was present when they
started and they were familiar with it. So adopting the newer entrants
into the data store arena was by no means done haphazardly or to try
out cool new tools. Each one precisely meets particular needs of the
site.

For instance, besides being fast and offering built-in sharding,
MongoDB was appealing because they don’t have to run ALTER TABLE every
time they add a new field to the database. Old entries coexist happily
with newer ones that have different fields. Zawodny also likes using a
Perl client to interact with a database, and the Perl client provided
by MongoDB is unusually robust because it was developed by 10gen
directly, in contrast to many other datastores where Perl was added by
some random volunteer.

The architecture at craigslist was shrewdly chosen to match their
needs. For instance, because most visitors click on the limited set
of current listings, the Memcache layer handles the vast majority of
hits and the MySQL database has a relatively light load.

However, the MySQL deployment is also carefully designed. Clusters are
vertically partitioned in two nested ways. First, different types of
items are stored on separate partitions. Then, within each type, the
nodes are further divided by the type of query:

  • A single master to handle all writes.

  • A group for very fast reads (such as lookups on a primary key)

  • A group for “long reads” taking a few seconds

  • A special node called a “thrash handler” for rare, very complex
    queries

It’s up to the application to indicate what kind of query it is
issuing, and HAproxy interprets this information to direct the query
to the proper set of nodes.

Naturally, redundancy is built in at every stage (three HAproxy
instances used in round robin, two Memcache instances holding the same
data, two data centers for the MongoDB archive, etc.).

It’s also interesting what recent developments have been eschewed by
craigslist. The self-host everything and use no virtualization.
Zawodny admits this leads to an inefficient use of hardware, but
avoids the overhead associated with virtualization. For efficiency,
they have switched to SSDs, allowing them to scale down from 20
servers to only 3. They don’t use a CDN, finding that with aggressive
caching and good capacity planning they can handle the load
themselves. They send backups and logs to a SAN.

Let’s turn now from the teeming environment of craigslist to the
decidedly lean operation of Pinterest, a much younger and smaller
organization. As presented
by Marty Weiner and Yashh Nelapati
, when they started web-scale
growth in the Autumn of 2011, they reacted somewhat like craigslist,
but with much less thinking ahead, throwing in all sorts of software
such as Cassandra and MongoDB, and diversifying a bit recklessly.
Finally they came to their senses and went on a design diet. Their
resolution was to focus on MySQL–but the way they made it work is
unique to their data and application.

They decided against using a cluster, afraid that bad application code
could crash everything. Sharding is much simpler and doesn’t require
much maintenance. Their advice for implementing MySQL sharding
included:

  • Make sure you have a stable schema, and don’t add features for a
    couple months.

  • Remove all joins and complex queries for a while.

  • Do simple shards first, such as moving a huge table into its own
    database.

They use Pyres, a
Python clone of Resque, to move data into shards.

However, sharding imposes severe constraints that led them to
hand-crafted work-arounds.

Many sites want to leave open the possibility for moving data between
shards. This is useful, for instance, if they shard along some
dimension such as age or country, and they suddenly experience a rush
of new people in their 60s or from China. The implementation of such a
plan requires a good deal of coding, described in the O’Reilly book MySQL High
Availability
, including the creation of a service that just
accepts IDs and determines what shard currently contains the ID.

The Pinterest staff decided the ID service would introduce a single
point of failure, and decided just to hard-code a shard ID in every ID
assigned to a row. This means they never move data between shards,
although shards can be moved bodily to new nodes. I think this works
for Pinterest because they shard on arbitrary IDs and don’t have a
need to rebalance shards.

Even more interesting is how they avoid joins. Suppose they want to
retrieve all pins associated with a certain board associated with a
certain user. In classical, normalized relational database practice,
they’d have to do a join on the comment, pin, and user tables. But
Pinterest maintains extra mapping tables. One table maps users to
boards, while another maps boards to pins. They query the
user-to-board table to get the right board, query the board-to-pin
table to get the right pin, and then do simple queries without joins
on the tables with the real data. In a way, they implement a custom
NoSQL model on top of a relational database.

Pinterest does use Memcache and Redis in addition to MySQL. As with
craigslist, they find that most queries can be handled by Memcache.
And the actual images are stored in S3, an interesting choice for a
site that is already enormous.

It seems to me that the data and application design behind Pinterest
would have made it a good candidate for a non-ACID datastore. They
chose to stick with MySQL, but like organizations that use NoSQL
solutions, they relinquished key aspects of the relational way of
doing things. They made calculated trade-offs that worked for their
particular needs.

My take-away from these two fascinating and well-attended talks was
that how you must understand your application, its scaling and
performance needs, and its data structure, to know what you can
sacrifice and what solution gives you your sweet spot. craigslist
solved its problem through the very precise application of different
tools, each with particular jobs that fulfilled craigslist’s

requirements. Pinterest made its own calculations and found an
entirely different solution depending on some clever hand-coding
instead of off-the-shelf tools.

Current and future MySQL

The conference keynotes surveyed the state of MySQL and some
predictions about where it will go.

Conference co-chair Sarah Novotny at keynote
Conference co-chair Sarah Novotny at keynotes.

The world of MySQL is much more complicated than it was a couple years
ago, before Percona got heavily into the work of releasing patches to
InnoDB, before they created entirely new pieces of software, and
before Monty started MariaDB with the express goal of making a better
MySQL than MySQL. You can now choose among Oracle’s official MySQL
releases, Percona’s supported version, and MariaDB’s supported
version. Because these are all open source, a major user such as

Facebook can even apply patches to get the newest features.

Nor are these different versions true forks, because Percona and
MariaDB create their enhancements as patches that they pass back to
Oracle, and Oracle is happy to include many of them in a later
release. I haven’t even touched on the commercial ecosystem around
MySQL, which I’ll look at later in this article.

In his opening
keynote
, Percona founder Peter Zaitsev praised the latest MySQL
release by Oracle. With graceful balance he expressed pleasure that
the features most users need are in the open (community) edition, but
allowed that the proprietary extensions are useful too. In short, he
declared that MySQL is less buggy and has more features than ever.

The former
CEO of MySQL AB, Mårten Mickos
, also found that MySQL is
doing well under Oracle’s wing. He just chastised Oracle for failing
to work as well as it should with potential partners (by which I
assume he meant Percona and MariaDB). He lauded their community
managers but said the rest of the company should support them more.

Keynote by Mårten Mickos
Keynote by Mårten Mickos.

Brian
Aker
presented an OpenStack MySQL service developed by his current
employer, Hewlett-Packard. His keynote retold the story that had led
over the years to his developing Drizzle (a true fork of MySQL
that tries to return it to its lightweight, Web-friendly roots) and
eventually working on cloud computing for HP. He described modularity,
effective use of multiple cores, and cloud deployment as the future of
databases.

A panel
on the second day of the conference brought together high-level
managers from many of the companies that have entered the MySQL space
from a variety of directions in a high-level discussion of the
database engine’s future. Like most panels, the conversation ranged
over a variety of topics–NoSQL, modular architecture, cloud
computing–but hit some depth only on the topic of security, which was
not represented very strongly at the conference and was discussed here
at the insistence of Slavik Markovich from McAfee.

Keynote by Brian Aker
Keynote by Brian Aker.