Haystacks and Headgames

This post originally published as a chapter from the free Radar report, Disruptive Possibilities: How Big Data Changes Everything.

The Demise of Counting

So far, I have been discussing how big data differs from previous methods of computing–how it provides benefits and creates disruptions. Even at this early stage, it is safe to predict that big data will become a multibillion-dollar analytics and BI business and possibly subsume the entire existing commercial ecosystem. During that process, it will have disrupted the economics, behavior, and understanding of everything it analyzes and everyone who touches it–from those who use it to model the biochemistry of personality disorders to agencies that know the color of your underwear.

Big data is going to lay enough groundwork that it will initiate another set of much larger changes to the economics and science of computing. (But the future will always contain elements from the past, so mainframes, tape, and disks will still be with us for a while.) This chapter is going to take a trip into the future and imagine what the post-big data world might look like. The future will require us to process zettabytes and yottabytes of data on million-node clusters. In this world, individual haystacks will be thousands of times the size of the largest Hadoop clusters that will be built in the next decade. We are going to discover what the end of computing might look like, or more precisely, the end of counting.

The first electronic computers were calculators on steroids, but still just calculators. When you had something to calculate, you programmed the machinery, fed it some data, and it did the counting. Early computers that solved mathematical equations for missile trajectory still had to solve these equations using simple math. Solving an equation the way a theoretical physicist might is how human brains solve equations, but computers don’t work like brains. There have been attempts at building computers that mimic the way brains solve equations, but engineering constraints make it more practical to build a hyperactive calculator that solves equations through brute force and ignorance.

Modern processors now operate with such brute force (i.e., clock speed) and the ignorance of simple electronics that can add, subtract, multiply, and divide with every clock tick. On a good day, this could be 12 billion every second. If processors that have 16 cores could be fed data fast enough (hint: they can’t), this would be 192 billion calculations/second. The software that made the algebraic method possible still runs on those crazy counting machines. Lucky for us, we can still get a lot accomplished with brute force and ignorance.

Scientific computing clusters have to solve problems so immense that they are constructed from thousands of multi-core processors. The NOAA weather forecasting supercomputer is a good example, but despite the immense computing power, weather forecasters still long for a supercomputer that is hundreds of thousands of times more powerful. Hadoop supercomputers follow in the architectural footsteps of these powerhouse-modeling machines, but instead of predicting hurricanes, commercial supercomputers are searching through haystacks of data for patterns. They’re looking to see if you bought Prada shoes after your Facebook friend bought a pair. In order for Facebook to deliver more targeted consumers to advertisers, their Hadoop supercomputers break up your own data and your friends’ information haystacks into thousands of piles and analyze every single piece of straw looking for connections and patterns among shopping habits and social behavior.

Early experiences of data pattern analysis began to reveal connections and started to answer questions that were not previously being asked. From electoral campaigns to insurance providers and restaurant chains, everyone has discovered new questions to ask big data. For computing platforms built on brute force and ignorance, they were suddenly becoming a lot less ignorant. Traditional supercomputers are still calculators, whereas Hadoop supercomputers are pattern explorers. Computers aren’t supposed to be able to predict social behavior.

Another aspect of computing in the 21st century is the rapid and relentless accumulation of data, and big data is just the leading wave of this tsunami. As network connection speeds to consumers increase further, data transforms from basic text and logs to graphics and eventually HD video in 3D. With 40 to 70 percent of the human brain dedicated to visual processing, the web will continue to become a high-resolution visual experience, further disrupting what it means to broadcast and entertain, because that is the fastest way to transmit information to a human (for example, it’s faster to tell time from an analog watch than it is to read digits).

Text consumes little space—this small book would fit onto an old floppy disk if I could find one—but visual information requires millions of megabytes. Processing visual data and eventually “teaching” computers to “see” images starts to hint at the computational problems ahead. In a few short years, Hadoop has gone from looking at strings of characters to macro-blocks of video as it transitioned from being a tool for discovery to a tool for seeing and understanding what it is processing.

Lose the Haystack

Big data tools will continue to be useful for a few more years before they usher in the era of Super-Sized Data. Advances in conventional computing technology will extend the life of many clusters. Processors within early clusters were expensive, consumed a lot of power, and were designed to run software from another era. In commercial supercomputing, analysis is limited by how fast the data can be pulled into a processor, quickly scanned for patterns, and then discarded to make room for the next chunk. Processors designed for that other era became overheated and underemployed memory controllers.

The generation of extremely low-power, cheap processors that are optimized around discovering (not calculating) exabytes of data will make it possible to build 100,000-node clusters. A cluster this size will again push the frontiers of what can be discovered, but in the world where exabytes of high-resolution visual data needs to be “seen,” the superior scalability first made famous by scientific clusters and then by Hadoop will not be enough. We need to stop looking for the needle in the haystack by looking at every piece of straw—we need to stop counting.

The current approach to computing is based on ideas from John Von Neumann, who is generally credited for the way computers work today. In a Hadoop cluster, every piece of data is still inspected, and if the pieces remain small, 100,000-node ARM clusters will be able to extend the shelf life of current “inspecting” clusters. If each piece of hay is a tweet or a zip code, then the pieces are small. If each piece of hay is three hours of full-motion HD video, the computing problem of inspecting starts to move out of reach even for these massive clusters. When the cluster has to “see” all the data instead of just inspecting it, then we need to create more scalable strategies.

Mind Another Gap

The human brain has long been recognized as a unique form of supercomputer. In a crude sense, it is a 50 billion-node cluster that consists of very high-resolution analog logic combined with a digital FM transmission fabric between all the nodes. A neuron fires when enough chemical signals have accumulated across all of its synaptic gaps. The neuron can be held in a state where it is waiting for the last molecule to arrive before it will fire. When the last molecule of neurotransmitter is dumped into one of the many synaptic gaps, the neuron finally fires.

Your 50-billion-node supercomputer has neurons that have a trigger resolution that is sensitive to a single molecule, and there are hundreds of trillions of molecules in your brain. Some of the neurochemicals come from the things you ingest and some from the RNA of the cells that are responsible for creating the amino acids that become neurotransmitters. Trying to solve problems like why people get Alzheimer’s are not yet within reach. Alzheimer’s is an autoimmune disease where the person’s own brain chemistry attacks the wiring between the neurons. This disease destroys the fabric and so destroys the person. Researchers believe that each neural cell is a sophisticated chemical computer in its own right. Many brain chemicals come from RNA expression and some come from your café latte. All of the genetic and environmental chemical processes band together to make up your charming self.

Combined with the sheer number of neurons and their interconnection, the problem of modeling a human brain with enough accuracy to answer questions about why people develop schizophrenia, fall in love, or feel more productive after drinking coffee requires all of the brain’s biochemistry to be simulated. However, the computing technology needs to advance far beyond even the largest conventional supercomputers we could ever construct. Like weather forecasters, neurologists can never have enough computing power.

The exponential growth in the amount of data and the scope of problems that need attention must be met with an exponential advancement in the science of counting things. In the future, there will be many problems that a modest 100 million-node Hadoop cluster can tackle, but modeling the human cortex won’t be one of them—we have to stop counting things. So, what might the next step be for computing? It will still involve some of the parlor tricks already found in Hadoop clusters, but will also need to steal a few from the supercomputer you are using now—your noodle.

Believing Is Seeing

My personal interest in neuroscience started because I was born with Sturge-Weber Syndrome, a rare developmental disorder. Nobody knows what causes SWS yet, so it’s a good candidate to simulate when we get our brain cluster built. Its most common symptom is a port-wine stain on the face. The birthmark is caused by an overabundance of capillaries just under the surface of the skin. The instructions for creating the right number of capillaries somehow gets messed up during development. If the foul-up occurs along the outer branches of the fifth cranial nerve, the result is a port-wine stain.

If the mistake occurs further up the line and closer to the cortex, then the brain itself ends up with a port-wine stain of sorts, and this is when the syndrome becomes lethal. Some newborns with the cortical version of SWS have seizures right after birth and must have sections of their cortex removed with a hemispherectomy in order to stop the seizures. It is a testament to both the extraordinary elasticity of the cortex and the relentless, built-in will to survive that some patients can survive this operation, recover, and thrive.

My SWS affected the retina of my right eye. The human eye has a complex internal network of capillaries that nourish the photoreceptors. These vessels sit in front of the retina (which is why you see flashes of blood vessels when a doctor shines in a light). My port-wine stain wasn’t in my cortex, but it was on my face and in my retina. My right eye wasn’t very healthy and I eventually lost sight in it when I was 12. Adolescence is a tough time for many people, but I had to relearn how to see and do simple things like hit a baseball or walk down a crowded hallway. I would later understand the neuroscience and cognitive processes that were taking place, but it turned out that I spent my wasted youth rewiring my cortex.

Simple things like walking down that hallway became my supercomputing problem to solve. As a kid, I was a good baseball player, but fell into a hitting slump after I became monocular. As I slowly relearned how to hit, at some point it became easier because I stopped “seeing” with my eye and started seeing with my muscles and body position. Coaches call it spatial intelligence and encourage players to develop that skill for many sports, including hitting and pitching. My brain had to become more spatially intelligent just to walk down the hallway. In order to locate a pitch in the strike zone (within an inch or so), precise and repeated location of the human body in space is required. In order to have that kind of precision, pitchers are usually born with a natural spatial intelligence. I have a crude version of this intelligence, but nowhere near the spatial genius that is Mariano Rivera.

The other change my brain had to make was a greater reliance on the right hemisphere that is responsible for spatial processing. This hemisphere must do a lot more spatial work when it is fed with only one eye’s worth of data. The brain is a stereo device, so with one eye, half the data went missing and then another chunk was lost going from stereo to mono vision. In order to see, I have to imagine or model a 3D world. Over time, my brain learned to synthesize a three-dimensional world from a limited amount of two-dimensional data. I was born left-handed and when I became left-eyed at 12, I became right-brained for both seeing and thinking.

A good example of how well my (and other brains) can spatially model the world occurs when I have to drive at night in the rain. I should not be able to do this, and must fully exploit every spatial trick to pull it off. I battle with low light (no data), deep shadows (contrast flare), and reflections (visual noise) from wet road surfaces. All these obstacles result in very little roadway telemetry coming through the windshield. I can’t possibly see everything that goes by, so I imagine what safely driving down the freeway might look like and then look for aberrations in the visual field and use this visual information to correct my internal model. I had to become a spatial supercomputer to survive, but humans with stereovision are also spatial supercomputers—I just need a few extra plugins. For me, the cliché “I’ll believe it when I see it” has become “I’ll see it when I believe it.”

Unlike supercomputers, human brains never examine every piece of hay but have well-developed strategies like associative memory, habituated learning, and a retina (even a single one) that can reduce data by orders of magnitude. Brains only look at things that might be interesting. The future of computing is also going to have to stop sifting through haystacks, and that means saying goodbye to Dr. von Neumann.

Spatial Intelligence

One of the drawbacks of studying the brain from a computer-centric perspective is that pretending the brain works like a computer is deceptively convenient. Although neuroscience has made recent and excellent progress, it is still a fairly young science. To be fair to neurologists, they have to decipher a 50 billion-node supercomputer using their own 50 billion-node supercomputers, where no two clusters have exactly the same version of the operating system and there is no shortage of bugs. The next generation of electronic supercomputers will need to adapt some strategies from the central nervous system that were perfected after doing live QA for millions of years.

The CNS is easily distracted and lazy—probably not a fan of big data since it is ruthless about ignoring data that is no longer interesting. It is interrupt-driven and gets quickly bored with static data. An experiment you can do with one closed eye is to put gentle pressure on the side of the open eyeball. This temporarily stops the involuntary jitter motion of the eye positioning muscles and causes the visual field to slowly fade to black. The eye muscles constantly need to get the attention of the photoreceptors or your world fades away. The neural technique of ignoring data is called habituation. Data that is not changing is no longer novel. Because the CNS is designed to be easily distracted, when the balance between habituation and novelty becomes disrupted, disorders like ADD can result.

The eye has about 110 million photoreceptors, but there are only about one million cables running to the visual cortex, so the reduction in data between what the jittery retinas receive and what they push upstream is immense. Retinas together spatially encode information like the width of a hallway that must be navigated. Seeing with two retinas makes this encoding extremely precise, whereas the precision in my case must be calculated in the cortex. If I’m tired, sick, or tipsy, then I start to make seeing mistakes. By performing this processing in two healthy retinas, the visual cortex is free to concentrate on looking for someone familiar in a crowded hallway. The human brain can recognize familiar faces at speeds exceeding the threshold of human perception. The CNS processes and encodes information at the source, without the need to bring all of the data into the brain. There is a lot of computing happening in the periphery, and this is what differentiates brain architecture from the conventional and centralized counting architectures of computers.

The human brain is a ridiculously powerful spatial supercomputer, driven by a crafty, lazy, and easily distracted nervous system. Recent FMRI studies of humans playing or even just listening to music reveals large swaths of the brain lighted like a Christmas tree–we don’t know why music is so important to humans, but we do know that the brain on music is one very happy supercomputer. Put high-performance athletes into an FMRI and watch more happy computers at work. Both big data and eventually this emerging form of spatial supercomputing will simultaneously be George Orwell’s worst nightmare and Oliver Sacks’ dream come true.

Associative Supercomputing

The ability to recognize your mother’s face in a crowd, in the blink of an eye, depends heavily on how your mother’s face was remembered. Associative memory in the brain is critical to the speed of the facial recognition engine. I can’t drive down the freeway in the rain at night if I can’t imagine what a safe version of what that might look like. I have to have something to compare the crude and incomplete image coming from my retina. That pattern is stored as complex spatial memory consisting of road signs and landmarks. This imagination strategy will not be successful if I can’t remember the stretch of road.

The brain is also good at retrieving more than just the stretch of road. Spatial memories also come with other pieces of potentially useful information. A stretch of road might be associated with memories of short on ramps, which have a habit of producing unexpected data in the form of accelerating vehicles. Unexpected situations need to be minimized, because quickly learning a new situation under impaired operating conditions produces errors in perception and the disruptive possibility of a fender bender.

Associative memory indexing is very easy for a brain, but very difficult using current computing technology. Spatial supercomputers will be much more difficult to build and early versions will still rely heavily on the massive, classic supercomputers. New approaches to both hardware and software design will be required to implement an associative computer that can recognize faces in milliseconds, but I suspect our first 50-billion node cluster probably won’t be as small as a cantaloupe either.

Back from the Future

A Hadoop cluster doesn’t have to be just about crunching big data. Like HPC clusters, Hadoop works because it busts up a problem into a thousand pieces and works on them in parallel. With big datasets, this is pretty much the only way to fly, but for small datasets, things like Hadoop are equally effective because it is about getting work done in parallel in much less time. Our retinas are like tiny Hadoop clusters. If a Hadoop cluster is put on a tiny board with hundreds of processors and a few TB of memory, then it could be installed into a pair of sunglasses. Brute force and linear parallelism will remain a useful strategy, but figuring out that only five haystacks out of 5000 are worth searching will get us to the next threshold of computing.

As hard as I have tried to dispense with the advances in counting in the last 60 years, the old form of computing won’t go away quickly or quietly; nor should it. The new form of computing will not be about counting–it will be a hybrid of classic techniques combined with more crafty ways of processing information that are similar to the ways our brains work. Big data is not always going to be about data, nor is it always going to be about discovery or insight derived from having improved workflows. It is about the intuition that results from the brain visually integrating information so quickly that the workflow becomes unconscious. Whatever the future of computing will be, it almost certainly starts with a name change.

O’Reilly Strata Conference — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.

Strata Rx Health Data Conference: September 25-27 | Boston, MA
Strata + Hadoop World: October 28-30 | New York, NY
Strata in London: November 15-17 | London, England