Understanding skepticism

Skepticism isn't a blanket rejection of data; it's central to understanding data.

I’d like to correct the impression, given by Derrick Harris on GigaOm, that I’m part of a backlash against “big data.”

I’m not skeptical about data or the power of data, but you don’t have to look very far or very hard to see data abused. The best people to be skeptical about the data, and to point out the abuse of data, are data scientists because they understand problems such as overfitting, bias, and much more.

Cathy O’Neil recently wrote about a Congressional hearing in which a teacher at a new data science program dodged some perceptive questions about whether he was teaching students to be skeptical about results, whether he was teaching students how to test whether their observations were real signals or just noise. Anyone who has worked with data knows that false correlations come cheaply, particularly when you’re working with a lot of data. But ducking that question is not the attitude we need.

Data is valuable. I see no end to the collection or analysis of data, nor should their be an and. But like any tool, we have to be careful about how we use it. Skepticism isn’t a blanket rejection of data; it’s central to understanding data. That’s precisely what makes “science” science.

And of all people, journalists should understand what skepticism means, even if they don’t have the technical tools to practice it.

tags: ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • Derrick Harris

    Actually, Mike, my post wasn’t about not being skeptical. Skepticism is great. It was about — among other things — the misconception that big data is all about correlation and therefore it needs to be taken down a rung.

    I think your post was right on in its message and would be a much-needed reality check if people pushing big data really were pushing this correlation-is-as-good-as-causation argument. I just don’t know anyone who really gets it who actually believes that, at least for non-trivial things.

    • Mike Loukides

      I’m happy to agree… though I have to say that I do see a lot of people pushing “correlation-as-good-as-causation” or, even worse, never even getting to the level of critical thinking where they raise that question. I was talking to someone just the other day who was lamenting that “too often, data analysis is just torturing the data until it supports what the CEO wants to see.” Way too much of that going on.

      I do wish we could lose the word “big.” Data is data. As I’ve said, I like Roger Magoulas’ definition: “big is when the size of the data is part of the problem.” But more often, it’s just hype, and not particularly helpful.

      • Derrick Harris

        Maybe we just talk to different people, although we both read the same book ;-)

  • atolley

    I thought the Harris article was very good, but I have to disagree with the assertion that: ” that is relies more on correlation than causation
    in order to find its vaunted insights. To the extent that’s true, it’s a
    fair criticism. Only I’m not certain how often it’s true for things
    that really matter
    .”

    Perhaps the biggest example was the Rogoff & Reinhoff paper on debt and GDP growth. 2 top flight economists arguing that their analysis (essentially correlation) implied high debt levels caused low GDP growth, rather than the reverse. The policy implications for the current state of the global economy do really matter

    .