Lessons of the Victorian data revolution

Transaction costs, crowdsourcing, and the persuasiveness of data were all in play long ago.

SteamPunk Frankenstein - By D. Mattocks by SteamPunk Frankenstein, on FlickrKen Cukier recently wrote about how useful analogies from the past are in explaining the potential of the current data revolution. Science as we know it was consciously created in the 19th century, and in many ways the current wave of data techniques feels like an echo of that first flood of innovations. It’s fascinating to read histories of the era like “The Philosophical Breakfast Club” and spot the parallels.

Take tides for example. You’ve probably never worried about the timing or height of the sea, but for Victorian sailors figuring out the tides was a life or death problem. Getting it wrong would mean a slipped schedule at best, or a shipwreck at worst. The only people who could accurately predict the tides were harbor masters, since conditions varied widely across different areas and required patient observation by locals. The harbor masters guarded their knowledge so carefully that even British naval captains had to pay them to get access to the information they needed to dock their vessels!

The harbor masters were data producers with a business model that excluded many potential users because the transaction costs were too high to be worthwhile. Sound familiar? That’s the state of many of the datasets I wish were openly available, from real-estate listings to full zip-code boundaries.

The Victorian solution was another familiar face — crowdsourcing. William Whewell arranged for hundreds of volunteers around the world to measure their local sea levels and send the numbers back to him. He then plotted the times of the tidal maximums on a map to create a visualization called a co-tidal chart. Below is a modern version from NASA:

Maps like these, along with more detailed tables, allowed navigators to make their journeys without being ambushed by the tides. This story could be a poster child for our own revolution, with open data fixing a painful real-world problem.

The limits of data

What’s really useful about historical analogies is that you can see how they played out in the long term. The villains of the tidal story were the harbor masters who hoarded their information, but in fact that was only a small part of the value they offered. Despite incredibly detailed maps of every port, we still rely on their descendants to pilot commercial ships into harbor. There’s a world of knowledge about currents, shifting sand banks and traffic patterns that it hasn’t been possible to compress into numbers or rules.

The lesson I draw from this is that in many new areas there’s some problems that are easy to fix by gathering and applying data, but we need to keep a bit of humility. James Scott’s “Seeing Like a State” looks at the legacy of the Victorian scientific revolution, and shows how the very success of its ideas had a dark side. Creating datasets may help technical people like us to understand problems and propose solutions, but it also means that harbor masters and other people with deep, lived experience of the domains will be overruled. In the 20th century the prestige of the scientific toolkit was used to justify disasters like the collectivization of agriculture, as technocrats around the world wielded numbers to take power away from “inefficient” smallholders. Those figures were mostly proven bogus by reality, as plans with no knowledge of conditions on the ground failed when confronted with the wildly variable conditions of soil, weather and pests that farmers had spent a lifetime learning to cope with.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

The way forward

Specialists like us who can understand and interpret data are in a privileged position. Most people have an exaggerated respect for arguments expressed as numbers or visualizations, because they don’t understand how many assumptions and simplifications go into these creations. It’s our job to remember that and balance our enthusiasm about the power of our techniques with some humility about their limits. It also makes education and popularization even more important, since we need a common language to talk with domain specialists, so they can keep our work honest with their own deep knowledge. The Victorian example shows that if we’re going to improve the world with data, it’s absolutely essential we stay grounded in reality.

Photo: SteamPunk Frankenstein – By D. Mattocks by SteamPunk Frankenstein, on Flickr


tags: , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.