We need open models, not just open data

If you really want to understand the effect data is having, you need the models.


Writing my post about AI and summoning the demon led me to re-read a number of articles on Cathy O’Neil’s excellent mathbabe blog. I highlighted a point Cathy has made consistently: if you’re not careful, modelling has a nasty way of enshrining prejudice with a veneer of “science” and “math.”

Cathy has consistently made another point that’s a corollary of her argument about enshrining prejudice. At O’Reilly, we talk a lot about open data. But it’s not just the data that has to be open: it’s also the models. (There are too many must-read articles on Cathy’s blog to link to; you’ll have to find the rest on your own.)

You can have all the crime data you want, all the real estate data you want, all the student performance data you want, all the medical data you want, but if you don’t know what models are being used to generate results, you don’t have much. You’re going to be showing black people homes in predominantly black neighborhoods not because you want to keep white neighborhoods pure, but because that’s where the model says they’re most likely to buy. You’re going to be stopping and searching more minority drivers without cause not because you’re prejudiced, but because the model says they’re more likely to be arrested for crimes. And if you stop more minority drivers, you almost certainly will arrest more minority drivers, so the model becomes self-fulfilling.

Intentions mean nothing when they’re hidden behind a model that makes decisions for you. A recent study of police profiling in my state, Connecticut, showed not only that blacks were more likely to be stopped than whites, but also that when they were stopped and searched, whites were significantly more likely to have something illegal in their cars. How would we build a model from this data, and what would it show? How would we know what the model is doing, if it’s never examined? Would the column with surprising data be dropped because it leads to unexpected and politically unacceptable results? Would it be weighted less than a column on, say, past arrests? If the model isn’t open, how would you ever know? As we become more dependent on modeling, more and more of our world becomes inscrutable. Without the models, you will never understand the way financial markets are manipulated. Without the models, you will never understand how school teachers are evaluated. You may never know why the real estate agent showed you certain houses, or why you’re paying so much for insurance. Is that OK? It all seems nice and scientific.

Open data enables the democratization of data. It’s important to be able to do your own analysis of public data sets. But if you really want to understand the effect data is having on law enforcement, on insurance, or on education, or on the economy, you need the models. Cathy has documented being stonewalled on requests for the models, which are almost always viewed as proprietary. That’s a problem, particularly when the modellers (not the poets) become the “unacknowledged legislators of the world” (Shelley, A Defense of Poetry).

Open models: the time has come.

Cropped image on article and category pages by Sonny Abesamis on Flickr, used under a Creative Commons license.

tags: , ,
  • Misha Belkindas

    Of course we need to show the models how the data were derived. They are called meta data or data about the data. Absence of metadata when big data are being released create problems with utilization because it is not disclosed how data were collected and compiled

  • Scott Evans

    With an accurate model of your data center and software that provides analysis, reporting and meaningful displays, you can quickly and easily determine the big picture and have confidence the decisions you are making are the correct ones.

    Scott Evans
    Hosted Exchange UK

  • Ezra Pound

    “Good writers are those who keep the language efficient…

    If a nation’s literature declines, the nation atrophies and decays.

    Your legislator can’t legislate for the public good,”

    Ezra Pound ABC of Reading

  • Mark Thristan

    This is simply what our math teachers told us: show your working.

  • Can you please show your working or email me…. after all nice article
    Cute Girl Cover Photos