Jeff Needham

Haystacks and Headgames

Imagining what the post-big data world might look like.

This post originally published as a chapter from the free Radar report, Disruptive Possibilities: How Big Data Changes Everything.

The Demise of Counting

So far, I have been discussing how big data differs from previous methods of computing–how it provides benefits and creates disruptions. Even at this early stage, it is safe to predict that big data will become a multibillion-dollar analytics and BI business and possibly subsume the entire existing commercial ecosystem. During that process, it will have disrupted the economics, behavior, and understanding of everything it analyzes and everyone who touches it–from those who use it to model the biochemistry of personality disorders to agencies that know the color of your underwear.

Big data is going to lay enough groundwork that it will initiate another set of much larger changes to the economics and science of computing. (But the future will always contain elements from the past, so mainframes, tape, and disks will still be with us for a while.) This chapter is going to take a trip into the future and imagine what the post-big data world might look like. The future will require us to process zettabytes and yottabytes of data on million-node clusters. In this world, individual haystacks will be thousands of times the size of the largest Hadoop clusters that will be built in the next decade. We are going to discover what the end of computing might look like, or more precisely, the end of counting.

The first electronic computers were calculators on steroids, but still just calculators. When you had something to calculate, you programmed the machinery, fed it some data, and it did the counting. Early computers that solved mathematical equations for missile trajectory still had to solve these equations using simple math. Solving an equation the way a theoretical physicist might is how human brains solve equations, but computers don’t work like brains. There have been attempts at building computers that mimic the way brains solve equations, but engineering constraints make it more practical to build a hyperactive calculator that solves equations through brute force and ignorance.

Read more…

Comment

Cloudy with a Chance of Meatballs: When Clouds Meet Big Data

Companies facing Internet-scale or enterprise-scale issues will need to identify their requirements for cloud computing and big data.

This post originally published as a chapter from the free Radar report, Disruptive Possibilities: How Big Data Changes Everything.

The Big Tease

As scientific and commercial supercomputing collide with public and private clouds, the ability to design and operate data centers full of computers is poorly understood by enterprises not used to handling 300 million anythings. The promise of a fully elastic and cost-effective computing plant is quite seductive, but Yahoo!, Google, and Facebook solved these problems on their own terms. More conventional enterprises that are now facing either internet-scale computing or a desire to improve the efficiency of their enterprise-scale physical plant will need to identify their own requirements for cloud computing and big data.

Conventional clouds are a form of platform engineering designed to meet very specific and mostly operational requirements. Many clouds are designed by residents of silos that only value the requirements of their own silo. Clouds, like any platform, can be designed to meet a variety of requirements beyond the purely operational. Everyone wants an elastic platform (or cloud), but as discussed in “Big Data: The Ultimate Computing Platform,” designing platforms at internet scale always comes with trade-offs, and elasticity does not come free or easy. Big data clouds must meet stringent performance and scalability expectations, which require a very different form of cloud.

The idea of clouds “meeting” big data or big data “living in” clouds isn’t just marketing hype. Because big data followed so closely on the trend of cloud computing, both customers and vendors still struggle to understand the differences from their enterprise-centric perspectives. On the surface there are physical similarities in the two technologies—racks of cloud servers and racks of Hadoop servers are constructed from the same physical components. But Hadoop transforms those servers into a single 1000-node supercomputer, whereas conventional clouds host thousands of private mailboxes.

Read more…

Comment

The Reservoir of Data

A reservoir that finally aggregates data in a single, scalable repository for a single analytic view will be the most important legacy of big data.

This post originally published as a chapter from the free Radar report, Disruptive Possibilities: How Big Data Changes Everything.

The Actual Internet

We wouldn’t be talking about big data at all if it weren’t for the “explosion” of the internet. Several technologies that were drifting around in the 1980s eventually converged to make the first boom possible. Mainstream consumer culture experienced it as if the boom came from nowhere. Since the 1990s, the internet has taken a few more evolutionary steps. Running a business or computing plant at internet scale had never been done before Yahoo! and then Google and Facebook attempted it. They solved many engineering problems that arose while taking commercial supercomputing from enterprise scale to internet scale. But as Yahoo! has since demonstrated, making a sustainably profitable business out of internet-scale computing is a different matter.

Traditional enterprises (companies that make films, 737s, or soap) are for the first time experiencing internet-scale computing problems, but they’re still stuck with their decades-old, entrenched approach to enterprise-scale computing. For those who remember what happened in the 1990s–or, more to the point, what didn’t happen–skepticism about the Miracle of Big Data is justified. Taken from the perspective that early technologies (for example, Java, Apache, or anything involving billions of users) are always unproven, the first boom is always going to be wishful thinking. And there was a lot of wishful thinking going on in the 1990s.

Many startup companies built prototypes using early technologies like the programming language Java, which made it easier to quickly develop applications. If a startup’s idea caught on, then the problem of too many customers quickly overwhelmed the designers’ intentions. Good problem to have. Building platforms to scale requires a lot of scaffolding “tax” up front, and although a startup might wish for too many customers, building a system from the get-go to handle millions of customers was expensive, complex, and optimistic even for Silicon Valley startups in the 1990s.

Read more…

Comment

Organizations: The Other Platform

A socially productive IT organization is a prerequisite for success with big data.

This post originally published as a chapter from the free Radar report, Disruptive Possibilities: How Big Data Changes Everything.

From Myelin to Metal

The world of advanced big data platforms is a strange place. Like a Gilbert and Sullivan musical, there is drama, farce, and mayhem in every act. Once in a long while, the curtain rises, time stands still, and as if by magic, it all works. Platform engineering at internet scale is an art form—a delicate balance of craft, money, personalities, and politics. With the commoditization of IT, however, there is much less craft and little art. Studies have shown that 60 to 80 percent of all IT projects fail with billions of dollars wasted annually. The end results are not simply inefficient, but frequently unusable. Projects that do finish are often late, over budget, or missing most of their requirements.

There is immense pressure on CIOs to convert their IT infrastructure into something as commodity as the plumbing in their office buildings. Deploying platforms on the scale required for cloud computing or big data will be the most complex projects IT groups undertake. Managing complex projects of this magnitude requires a healthy IT culture not only to ensure the successful discovery of the insights the business craves, but to continuously deliver those insights in a cost-effective way. Computing platforms deeply impact the corporation they serve, not to mention the end users, vendors, partners, and shareholders. This mobius strip of humanity and technology lies at the heart of the very model of a modern major enterprise. A socially productive IT organization is a prerequisite for success with big data.

Humans organized themselves into hierarchies well before the water cooler appeared. In a corporate organization, hierarchies try to balance the specialization of labor and details only specialists worry about by distilling minutiae so that leaders can make informed business decisions without being confused or overwhelmed. Distilling minutiae relies on preserving the right amount of detail and abstracting the rest. Because details are not created equal, the goal of abstraction is to prioritize the right details and mask the ones that cause confusion and fear, both of which do a cracker jack job of impairing judgment. When done well, a lot of good decisions can be made very quickly and course corrections sometimes can make up for bad decisions. Since organizations are made up of people whose motivation, emotions, and behavior combine with their understanding of topics to produce those judgments, it is rarely done well, let alone efficiently.

Read more…

Comment: 1

Big Data: The Ultimate Computing Platform

From interconnected hardware and software platforms to engineering and product development platforms, big data is DIY supercomputing.

This post originally published as a chapter from the free Radar report, Disruptive Possibilities: How Big Data Changes Everything.

Introduction to Platforms

A platform is a collection of sub-systems or components that must operate like one thing. A Formula One racing vehicle (which drivers refer to as their platform) is the automobile equivalent of a supercomputer. It has every aspect of its design fully optimized not simply for performance, but performance per liter of gas or kilogram of curb weight. A 2-litre engine that creates 320HP instead of 140HP does so because it is more efficient. The engine with higher horsepower does have better absolute performance, but performance really means efficiency—like HP/KG and miles/gallon, or with computing platforms, jobs executed/watt. Performance is always measured as a ratio of something being accomplished for the effort expended.

The descendant of Honda’s F1 technology is now found in other cars because optimized technology derived from the racing program enabled Honda to design more powerful vehicles for consumers. A Honda Civic is just as much a platform as the F1. The engine, brakes, steering, and suspension are designed so it feels like you’re driving a car, not a collection of complex sub-components. Platforms can span rivers, serve ads for shoes, and reserve seats on another notable platform—the kind with wings.

Read more…

Comment: 1

The Wall of Water

"Agencies & enterprises will find answers to questions they could never afford to ask; big data will help identify questions they never knew to ask."

This post originally published as a chapter from the free Radar report, Disruptive Possibilities: How Big Data Changes Everything.

And Then There Was One

And then there was one—one ecosystem, one platform, one community—and most importantly, one force that retrains vendors to think about customers again. Welcome to the tsunami that Hadoop, noSQL, and all internet-scale computing represents. Some enterprises, whose businesses don’t appear to be about data at all, are far from the shoreline where the sirens are faint. Other organizations that have been splashing around in the surf for decades now find themselves watching the tide recede rapidly. The speed of the approaching wave is unprecedented, even for an industry that has been committed, like any decent youth movement, to innovation, self-destruction, and reinvention. Welcome to the future of computing. Welcome to big data. Welcome to the end of computing as we have known it for 70 years.

Big data is a type of supercomputing for commercial enterprises and governments that will make it possible to monitor a pandemic as it happens, anticipate where the next bank robbery will occur, optimize fast food supply chains, predict voter behavior on election day, and forecast the volatility of political uprisings while they are happening. The course of economic history will change when, not if, criminals stand up their Hadoop clusters. So many seemingly diverse and unrelated global activities will become part of the big data ecosystem. Like any powerful technology, in the right hands, it propels us toward limitless possibilities. In the wrong hands, the consequences can be unimaginably destructive.

The motivation to get big data is immediate for many organizations. If a threatening organization gets the tech first, then the other organization is in serious trouble. If Target gets it before Kohl’s or the Chinese navy gets it before the US navy or criminal organizations get it before banks, then they will have a powerful advantage. The solutions will require enterprises to be innovative at many levels, including technical, financial, and organizational. As in the 1950s during the cold war, whoever masters big data will win the arms race, and big data is the arms race of the 21st century.

Read more…

Comment

Six disruptive possibilities from big data

Specific ways big data will inundate vendors and customers.

Disruptive PossibilitiesMy new book, Disruptive Possibilities: How Big Data Changes Everything, is derived directly from my experience as a performance and platform architect in the old enterprise world and the new, Internet-scale world.

I pre-date the Hadoop crew at Yahoo!, but I intimately understood the grid engineering that made Hadoop possible. For years, the working title of this book was The Art and Craft of Platform Engineering, and when I started working on Hadoop after a stint in the Red Hat kernel group, many of the ideas that were jammed into my head, going back to my experience with early supercomputers, all seem to make perfect sense for Hadoop. This is why I frequently refer to big data as “commercial supercomputing.”

In Disruptive Possibilities, I discuss the implications of the big data ecosystem over the next few years. These implications will inundate vendors and customers in a number of ways, including: Read more…

Comment: 1