Roll-your-own database architecture

Two years ago, most of the conversations around big data had a futuristic, theoretical vibe. That vibe has been replaced with a gritty sense of practically. Today, when big data or some surrogate term arises in conversation, the talk is likely to focus not on “what if,” but on “how do we get it done?” and “what will it cost?”

Real-time big data analytics and the increasing need for applications capable of handling mixed read/write workloads — as well as transactions and analytics on “hot” data — are putting new pressures on traditional data management architectures.

What’s driving the need for change? There are several factors, including a new class of apps for personalizing the Internet, serving dynamic content, and creating rich user experiences. These apps are data driven, which means they essentially feed on deep data analytics. You’ll need a steady supply of activity history, insights, and transactions, plus the ability to combine historical analytics with hot analytics and read/write transactions.

Some people think all the data should sit in one big Hadoop data lake; others think that all data should sit in memory for speed. Here’s another potential scenario: store historical data on cost-effective rotational media, while storing more valuable, recent data on solid-state drives (SSD). That kind of combined architecture would enable you to to run historical analytics on the older data and hot analytics and transactions on the newer and more valuable data.

It’s no secret that VCs are pouring cash into startups promising a new generation of advanced analytics. The rapidly growing supply — some might say oversupply — of newer analytics will drive even greater adoption and usage, highlighting (and in many instances creating) processing bottlenecks that simply did not exist two years ago. In other words, scale is now a serious issue.

Processing bottlenecks are painful for users, but they create wonderful opportunities for innovators who are willing to explore new database management architectures. This seems like a perfect moment for a new wave of “roll your own” configurations designed specifically to handle unique and unorthodox amalgams of data.

That said, there’s also a window for vendors who can devise smart, powerful, flexible data management architectures that will take you where you need to go without breaking the bank.

Whether you’re doing it yourself or relying on a vendor, blended architectures should let you “scale out” before you scale up, which spares you the embarrassment of flying blindly into terra incognita, waving your checkbook and looking for a consultant to rescue you.

Sooner rather than later, the larger issue of database architecture evolution must be addressed. As we know, evolution doesn’t proceed according to a plan. It moves in fits and starts. At this moment, it seems manifestly apparent that traditional database architectures are no longer sufficient. What’s not entirely clear, however, is the logical next step.

If your business requires high transactional performance on multi-terabyte datasets, and you’re concerned about how many servers you’re running, a blended architecture might be a step in the right direction.

This post is part of a collaboration between O’Reilly and Aerospike. See our statement of editorial independence.

Photo by Kenny Louie, used under a Creative Commons license.

Roll-your-own database architecture

Making the case for blended architectures in the rapidly evolving universe of advanced analytics.

Get the O’Reilly Systems Engineering and Operations Newsletter