The dangers of data-driven list-making

Such lists might mean we miss the truly great breakthroughs, inspirations, and leaps of faith necessary to evolve.

Editor’s note: this post originally appeared on Tilt the Windmill; it is republished here with permission.

Startupfest’s Pamela Perotti asked for my thoughts on this great Forbes piece by Lightspeed’s Barry Eggers about using big data to build top ten lists that actually matter.

First: it’s an excellent post. You should read it. I’ll wait.

Every enterprise decision-maker will soon be running their business according to the lists Barry envisions, as the power of big data and analytics finds its way into every boardroom and dashboard. Society will soon demand them, too. But while such analysis is tremendously valuable, it carries two dangers: the politics of setting criteria, and the trap of relying on data for inspiration.

The harsh light of data

Barry is right: rather than using our precious time and resources to make yet another linkbait list of the 50 cutest kittens, or the seven people I’ll try to avoid at SXSW, we should use abundant data and a connected world to build lists that matter: lying politicians, bad cars, lousy doctors. Then we can use these lists to change policy and behaviour because we’ll make things transparent. Shining the harsh light of data on something can improve it.

Unfortunately, expecting big data to be a panacea that cures all our ills is overreaching and can lead to the kind of hype that scuttles otherwise ascendant technologies.

The unquestionable truth is that we optimize better with data. We can indeed build good lists. Assuming the criteria we choose are tied to outcomes we want, and reasonably objective — politicians’ lies, cars’ maintenance costs, doctors’ effectiveness — that’s a good thing.

What’s even better, and what Eggers’ article overlooks, is that because the lists are generated by software, they can be tailored. Each of us can have our own list because the cost of producing a list is nearly zero. Rather than just a list of safe cars, what about the safest car for Pamela, based on her driving record, weight, and the climate around her house? A tailored list is far better than an average one, and data makes it possible to customize lists so they’re as good as possible.

(That’s also one of the big reasons we’re willing to disclose some things about ourselves: we get better results. A drop in privacy is an increase in utility.)

There are two big downsides

The first big downside to data-driven list-making is the question of who gets to set the criteria for a list. A list of the worst foods might include bacon and foie gras, but there are plenty of folks who’d rank those particular dishes among the best because they care about taste rather than health or ethics. Walking home might be better for your health, but it means you have a half hour less to play with your children. Everything’s a tradeoff, and someone has to decide what the right criteria are.

This is policy-making at its finest. And it’s never easy. Just try to get 10 people to agree on what constitutes a “good” politician when five of them support the NRA and five support Obamacare, and you’ll see how quickly this kind of criteria-setting devolves into a nanny-state argument for anything that’s even slightly subjective.

The second downside is an over-reliance on data. Any time you try to optimize something, you run into a problem known as a local maximum. You’re using data and algorithms to do the best thing you can within your current model. But the scope of your model might be wrong.

Imagine, for example, a lemonade stand. You use data to optimize everything — cost of lemons, pricing, where to set up, and so on. You’re making a ton of profit.

But perhaps you could make way more profit by selling iced tea. Because you framed your business as a “lemonade stand” and not a “refreshment provider,” you missed the opportunity to get to an even better position — what’s called a global maximum.

Of course, maybe you should be a refreshment truck or a sandwich shop. There’s no end to the scope you might chose, and you have to constrain things somehow, otherwise you have no business model. Framing is necessary because it lets you focus on a business, rather than running madly in all directions. But a tight focus on the current business model means that surrounding opportunities vanish into the periphery.

Put another way, we sometimes mistake optimization for inspiration. Data is for optimization; humans are for inspiration. Expecting the former to give you the latter is a bad thing.

Local maxima kill companies

Missing a global maximum — more specifically, failure to frame (and reframe) the business — is what ends big, incumbent companies. Think about Kodak, which pioneered film but missed digital cameras and smartphones. They failed to reframe their business as “sharing memories.” When you think of a picture that way, through modern eyes, an Internet connection is a pretty obvious feature of a picture-taking device.

Want another example? Blockbuster ran afoul of the local maximum when it thought it was in the video store business and got addicted to late fees as a source of revenue. No amount of optimization through data would have told Blockbuster, “get less money from your customers by mailing them DVDs with no return date.”

Framing problems are obvious in hindsight. A decade ago we didn’t have a smartphone in every pocket, and Kodak’s demise wasn’t clear to everyone. When one of Blackberry’s CEOs wondered why anyone would want a camera on their phone, he was guilty of the same thing.

“…it was the ‘candy bar’ format, and it had a track wheel, and it had really good connectivity. It was really nice for scrolling around, and it could play video, and it had a camera. Up until that point, Mike (Lazaridis) had said, ‘That’s crazy, why would I ever want a camera?’ All of a sudden, BlackBerry becomes a consumer play.” (from Micheal A. Levin)

Are lists of cars and politicians the best way to fix driving and politics?

Back to the examples presented in the article. Perhaps the best, safest car is a self-driving car. Studies suggest that this is the case. But early on, such cars need a lot of maintenance (perhaps because there aren’t enough of them made to work out the bugs). They’re also expensive (because the manufacturers can’t amortize the cost of invention across many vehicles.) So year after year, the self-driving cars come in less reliable and more expensive on the list. If people listen to the list, the cars don’t get bought. We’re stuck with dangerous, human-driven vehicles. Innovation grinds to a halt.

For another example from the Forbes piece, consider lying politicians. We might see a ranking of dishonest members of Congress, but the algorithm or the data is unlikely to step back and say, “maybe representative government is a hack. Maybe lobbying is literally misrepresentation. Maybe in an era of Facebook, we don’t need representatives, and instead we should use digital voting and a direct democracy.”

In other words, data-driven optimization is great for doing the best at the game we’re currently playing; it’s awful for changing the rules or switching to a different game.

I love the idea of big data helping us better understand and optimize the world around us. I do think it gives an uncomfortable amount of power to those who create the tools that make the lists, and I question whether such lists will mean we miss the truly great breakthroughs, inspirations, and leaps of faith through which we evolve as a species.



Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.