A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your Big Data in the Cloud with a Hadoop.”
You don’t have to read much industry news to get the sense that “big data” is sliding into the trough of Gartner’s hype curve. That’s natural. Regardless of the technology, the trough of the hype cycle is driven by by a familiar set of causes: it’s fed by over-agressive marketing, the longing for a silver bullet that doesn’t exist, and the desire to spout the newest buzzwords. All of these phenomena breed cynicism. Perhaps the most dangerous is the technologist who never understands the limitations of data, never understands what data isn’t telling you, or never understands that if you ask the wrong questions, you’ll certainly get the wrong answers.
Big data is not a term I’m particularly fond of. It’s just data, regardless of the size. But I do like Roger Magoulas’ definition of “big data”: big data is when the size of the data becomes part of the problem. I like that definition because it scales. It was meaningful in 1960, when “big data” was a couple of megabytes. It will be meaningful in 2030, when we all have petabyte laptops, or eyeglasses connected directly to Google’s yottabyte cloud. It’s not convenient for marketing, I admit; today’s “Big Data!!! With Hadoop And Other Essential Nutrients Added” is tomorrow’s “not so big data, small data actually.” Marketing, for better or for worse, will deal.
Whether or not Moore’s Law continues indefinitely, the real importance of the amazing increase in computing power over the last six decades isn’t that things have gotten faster; it’s the size of the problems we can solve has gotten much, much larger. Or as Chris Gaun just wrote, big data is leading scientists to ask bigger questions. We’ve been a little too focused on Amdahl’s law, about making computing faster, and not focused enough on the reverse: how big a problem can you solve in a given time, given finite resources? Modern astronomy, physics, and genetics are all inconceivable without really big data, and I mean big on a scale that dwarfs Amazon’s inventory database. At the edges of research, data is, and always will be, part of the problem. Perhaps even the biggest part of the problem.
In the next year, we’ll slog through the cynicism that’s a natural outcome of the hype cycle. But I’m not worrying about cynicism. Data isn’t like Java, or Rails, or any of a million other technologies; data has been with us since before computers were invented, and it will still be with us when we move onto whatever comes after digital computing. Data, and specifically “big data,” will always be at the edges of research and understanding. Whether we’re mapping the brain or figuring out how the universe works, the biggest problems will almost always be the ones for which the size of the data is part of the problem. That’s an invariant. That’s why I’m excited about data.