I’d like to correct the impression, given by Derrick Harris on GigaOm, that I’m part of a backlash against “big data.”
I’m not skeptical about data or the power of data, but you don’t have to look very far or very hard to see data abused. The best people to be skeptical about the data, and to point out the abuse of data, are data scientists because they understand problems such as overfitting, bias, and much more.
Cathy O’Neil recently wrote about a Congressional hearing in which a teacher at a new data science program dodged some perceptive questions about whether he was teaching students to be skeptical about results, whether he was teaching students how to test whether their observations were real signals or just noise. Anyone who has worked with data knows that false correlations come cheaply, particularly when you’re working with a lot of data. But ducking that question is not the attitude we need.
Data is valuable. I see no end to the collection or analysis of data, nor should their be an and. But like any tool, we have to be careful about how we use it. Skepticism isn’t a blanket rejection of data; it’s central to understanding data. That’s precisely what makes “science” science.
And of all people, journalists should understand what skepticism means, even if they don’t have the technical tools to practice it.