The promise and problems of big data

A look at the social and moral implications of living in a deeply connected, analyzed, and informed world.

Editor’s note: this is an excerpt from our new report Data: Emerging Trends and Technologies, by Alistair Croll. You can download the free report here.

We’ll now look at both the light and the shadows of this new dawn, the social and moral implications of living in a deeply connected, analyzed, and informed world. This is both the promise and the peril of big data in an age of widespread sensors, fast networks, and distributed computing.

Solving the big problems

The planet’s systems are under strain from a burgeoning population. Scientists warn of rising tides, droughts, ocean acidity, and accelerating extinction. Medication-resistant diseases, outbreaks fueled by globalization, and myriad other semi-apocalyptic Horsemen ride across the horizon.

Can data fix these problems? Can we extend agriculture with data? Find new cures? Track the spread of disease? Understand weather and marine patterns? General Electric’s Bill Ruh says that while the company will continue to innovate in materials sciences, the place where it will see real gains is in analytics.

It’s often been said that there’s nothing new about big data. The “iron triangle” of Volume, Velocity, and Variety that Doug Laney coined in 2001 has been a constraint on all data since the first database. Basically, you could have any two you want fairly affordably. Consider:

  • A coin-sorting machine sorts a large volume of coins rapidly, but assumes a small variety of coins. It wouldn’t work well if there were hundreds of coin types.
  • A public library, organized by the Dewey Decimal System, has a wide variety of books and topics, and a large volume of those books — but stacking and retrieving the books happens at a slow velocity.

What’s new about big data is that the cost of getting all three Vs has become so cheap it’s almost not worth billing for. A Google search happens with great alacrity, combs the sum of online knowledge, and retrieves a huge variety of content types.

With new affordability comes new applications. Where once a small town might deploy another garbage truck to cope with growth, today it can affordably analyze routes to make the system more efficient. Ten years ago, a small town didn’t rely on data scientists; today, it scarcely knows it’s using them.

Gluten-free dieters aside, Norman Borlaug saved billions by carefully breeding wheat and increasing the world’s food supply. Will the next billion meals come from data? Monsanto thinks so, and is making substantial investments in analytics to increase farm productivity.

While much of today’s analytics is focused on squeezing the most out of marketing and advertising dollars, organizations like DataKind are finding new ways to tackle modern challenges. Governments and for-profit companies are making big bets that the answers to our most pressing problems lie within the very data they generate.

The death spiral of prediction

The city of Chicago thinks a computer can predict crime. But does profiling doom the future to look like the past? As Matt Stroud asks: is the computer racist?

When governments share data, that data changes behavior. If a city publishes a crime map, then the police know where they are most likely to catch criminals. Homeowners who can afford to leave will flee the area, businesses will shutter, and that high-crime prediction turns into a self-fulfilling prophecy.

Call this, somewhat inelegantly, algorithms that shit where they eat. As we consume data, it influences us. Microsoft’s Kate Crawford points to a study that shows Google’s search results can sway an election.

Such feedback loops can undermine the utility of algorithms. How should data scientists deal with them? Do they mean that every algorithm is only good for a limited amount of time? When should the algorithm or the resulting data be kept private for the public good? These are problems that will dog data scientists in coming years.

Sensors, sensors everywhere

In a Craigslist post that circulated in mid-2014 (since taken down), a restaurant owner ranted about how clients had changed. Hoping to boost revenues, the story went, the restaurant hired consultants who reviewed security footage to detect patterns in diner behavior.

The restaurant happened to have 10-year-old footage of their dining area, and the consultants compared the older footage to the new recordings, concluding that smartphones had significantly altered diner behavior and the time spent in the restaurant.

If true, that’s interesting news if you’re a restaurateur. For the rest of us, it’s a clear lesson of just how much knowledge is lurking in pictures, audio, and video that we don’t yet know how to read but soon will.

Image recognition and interpretation — let alone video analysis — is a Very Hard Problem, and it may take decades before we can say, “Computer, review these two tapes and tell me what’s different about them” and get a useful answer in plain English. But that day will come — computers have already cracked finding cats in online videos.

When that day arrives, every video we’ve shot and uploaded — even those from a decade ago — will be a kind of retroactive sensor. We haven’t been very concerned about being caught on camera in the past because our behavior is hidden by the burden of reviewing footage. But just as yesterday’s dumpster-diving and wiretaps gave way to today’s effortless surveillance of whole populations, we’ll realize that the sensors have always been around us.

Already obvious are the smart devices on nearly every street and in every room. Crowdfunding sites are a treasure-trove of such things, from smart bicycles to home surveillance. Indeed, littleBits makes it so easy to create a sensor, it’s literally kids’ play. And when Tesla pushes software updates to its cars, the company can change what it collects and how it analyzes it long after the vehicle has left the showroom.

The evolution of how we collect data in a world where every output is also an input — when you can’t read a thing without it reading you back — poses immense technical and ethical challenges. But it’s also a massive business opportunity, changing how we build, maintain, and recover almost everything in our lives.

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.