Lessons of the Victorian data revolution

Transaction costs, crowdsourcing, and the persuasiveness of data were all in play long ago.

SteamPunk Frankenstein - By D. Mattocks by SteamPunk Frankenstein, on FlickrKen Cukier recently wrote about how useful analogies from the past are in explaining the potential of the current data revolution. Science as we know it was consciously created in the 19th century, and in many ways the current wave of data techniques feels like an echo of that first flood of innovations. It’s fascinating to read histories of the era like “The Philosophical Breakfast Club” and spot the parallels.

Take tides for example. You’ve probably never worried about the timing or height of the sea, but for Victorian sailors figuring out the tides was a life or death problem. Getting it wrong would mean a slipped schedule at best, or a shipwreck at worst. The only people who could accurately predict the tides were harbor masters, since conditions varied widely across different areas and required patient observation by locals. The harbor masters guarded their knowledge so carefully that even British naval captains had to pay them to get access to the information they needed to dock their vessels!

The harbor masters were data producers with a business model that excluded many potential users because the transaction costs were too high to be worthwhile. Sound familiar? That’s the state of many of the datasets I wish were openly available, from real-estate listings to full zip-code boundaries.

The Victorian solution was another familiar face — crowdsourcing. William Whewell arranged for hundreds of volunteers around the world to measure their local sea levels and send the numbers back to him. He then plotted the times of the tidal maximums on a map to create a visualization called a co-tidal chart. Below is a modern version from NASA:

Maps like these, along with more detailed tables, allowed navigators to make their journeys without being ambushed by the tides. This story could be a poster child for our own revolution, with open data fixing a painful real-world problem.

The limits of data

What’s really useful about historical analogies is that you can see how they played out in the long term. The villains of the tidal story were the harbor masters who hoarded their information, but in fact that was only a small part of the value they offered. Despite incredibly detailed maps of every port, we still rely on their descendants to pilot commercial ships into harbor. There’s a world of knowledge about currents, shifting sand banks and traffic patterns that it hasn’t been possible to compress into numbers or rules.

The lesson I draw from this is that in many new areas there’s some problems that are easy to fix by gathering and applying data, but we need to keep a bit of humility. James Scott’s “Seeing Like a State” looks at the legacy of the Victorian scientific revolution, and shows how the very success of its ideas had a dark side. Creating datasets may help technical people like us to understand problems and propose solutions, but it also means that harbor masters and other people with deep, lived experience of the domains will be overruled. In the 20th century the prestige of the scientific toolkit was used to justify disasters like the collectivization of agriculture, as technocrats around the world wielded numbers to take power away from “inefficient” smallholders. Those figures were mostly proven bogus by reality, as plans with no knowledge of conditions on the ground failed when confronted with the wildly variable conditions of soil, weather and pests that farmers had spent a lifetime learning to cope with.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

The way forward

Specialists like us who can understand and interpret data are in a privileged position. Most people have an exaggerated respect for arguments expressed as numbers or visualizations, because they don’t understand how many assumptions and simplifications go into these creations. It’s our job to remember that and balance our enthusiasm about the power of our techniques with some humility about their limits. It also makes education and popularization even more important, since we need a common language to talk with domain specialists, so they can keep our work honest with their own deep knowledge. The Victorian example shows that if we’re going to improve the world with data, it’s absolutely essential we stay grounded in reality.

Photo: SteamPunk Frankenstein – By D. Mattocks by SteamPunk Frankenstein, on Flickr


tags: , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • Canuck

    It’s good to point out that what we temporarily call “crowd-sourcing” long predates the digital age — another useful example is the Oxford English Dictionary.

    The situation with tides isn’t quite that simple, though. The first publicly-available tide table in the UK was Holden’s, published for Liverpool in 1770 (67 years before Queen Victoria came to the throne), and it was a model for public tide tables for other UK ports. These printed tide tables were the revolution that broke the harbourmasters’ grip on the data and opened it up to the public.

    The UK in the late 18th and 19th century had a huge appetite for these printed data collections — witness Debrett’s Peerage (from 1769), Bradshaw’s railway schedules (from 1839), bible concordances, almanacs, the Navy List, etc. etc. Local tide tables fit nicely onto a middle-class bookshelf beside the rest of these.

    I don’t know much about Whewell’s research work on tides, but it seems to have been something different. He wasn’t interested in the particulars of the tide at a specific port, but wanted to study tides on a global level — it probably didn’t matter if any particular measurement was a bit off, as long as the dataset as a whole worked. Collecting data from a huge number of local tide tables (where they even existed outside the UK) would have been impractical at best, and he needed tide data from outside ports as well.

  • Very interesting, thanks! Here’s the link to my main source for the state of tide tables when Whewell started his work:

    “In one notable incident, the tides flowed over the Blackfriars Bridge in December 1814, flooding Windsor Park and inundating warehouses and businesses nearby.

    Yet, oddly, given the extreme importance of water to Britain, knowledge of the tides was still extremely scanty. Two centuries before, Francis Bacon had suggested an international system of tidal observations to remedy the situation, yet his call had gone unheeded. The only people who systematically observed the times of the high and low tides were harbormasters, and they tended to keep their information as closely guarded secrets: few accurate tide tables based on long-term observations were published and made available. The Royal Navy had no such information; captains were responsible for trying to gain the information on their own, by contacting harbormasters and hoping to get useful information from them-usually by paying bribes.”

    I’d love to learn more about the history of tidology though, and it sounds like you might have some recommendations beyond popular science works like the Breakfast Club?

  • Read your article with great interest, when I reached the comments about agriculture, I wondered exactly what you meant. Are you referring to the communist state style of production? or the modern – very scientific/data focused methods of farming today?

    I blogged my thoughts here: http://daringrimm.wordpress.com/2011/05/23/sad-example-of-how-deeply-ingrained-the-suspicion-of-modern-agriculture-is/ when I was concerned the comments were in regard to modern agriculture, but now I am suspicious I may have mis-read?

  • Thanks Darin, that is a good point. It was hard to compress into a short blog post, but I was referring specifically the Utopian agricultural schemes like collectivization. I highly recommend the Seeing Like a State book that discusses some of the history of the mega-projects that actually spanned the political spectrum in the first half of the 20th century.

    I would hate to imply that I’m against agricultural science, it has produced wonders for the world. The problem comes when the mantle of science is used to justify social policies without the rigor of the true scientific process being applied.

  • Hope it’s okay to jump in here. First, thanks, Pete, for bringing my book into the discussion of a modern issue in scientific method–I certainly agree with you that these historical analogies can be quite telling, indeed that’s one reason I wrote the book!

    It’s true that there were the Liverpool tide tables, and a few others, but there was no widespread attempt to create and publish these throughout the UK. Certainly, there was little published about the London tides, and these were becoming more and more unpredictable with all the construction along the Thames. It was not until the 19th century that the effort was made to publish tide tables systematically (and this is when the government got involved, via the Chief Hydrographer Francis Beaufort–inventor of the Beaufort Scale, supporter of Darwin’s position on the HMS Beagle, and friend of William Whewell and John Herschel).

    It’s also true that Whewell’s particular interest was in mapping the global patterns of high tides. But his method for doing this included organizing massive amounts of simultaneous coastal tide observations around the world. In a sense he can be seen as an innovator in international scientific research, because he (and Beaufort) got numerous countries involved. At the end of it he had over 40,000 data points that were “reduced” by “computers,” that is the people who did calculations in the days before machines which ended up with that name! (Whewell’s friend Charles Babbage, with whom he, John Herschel, and the economist Richard Jones had those “Philosophical Breakfasts” at Cambridge University, was the one who first invented a mechanical computer, the Analytical Engine.)

    One reference book on the history of tidology worth checking out is David Cartwright’s TIDES: A SCIENTIFIC HISTORY (Cambridge Univ. Press, 1999).

  • Thanks so much Laura, I really appreciate you jumping in, and I’ve ordered the Cartwright history. I’m also glad to hear I didn’t mangle your meaning too badly in my re-telling.

    I’ll be returning to your book in future articles, it’s so rare to find a warts-and-all intellectual history. I think every aspiring engineer should read it, if only to learn from the mistakes Babbage made in his computing projects. I know they reminded me of some of my own tendencies, though hopefully I’m not quite so argumentative!

  • Robert Lucente

    In keeping w/ the “Victorian” theme, check out the book “The Victorian Internet: The Remarkable Story of the Telegraph and the Nineteenth Century’s On-line Pioneers by Tom Standage”

  • Shari Kay Hunter

    Thanks, this was awesome. Write more! Great to hear from Laura Snyder. Much appreciated, Shari