If your data practices were made public, would you be nervous?

Solon Barocas on data mining's reputation and the ethics of data collection.

Solon BarocasThe practice of data mining often elicits a knee-jerk reaction from consumers, with some viewing it as as a violation of their privacy. In a recent interview, Solon Barocas (@s010n), a doctoral student at New York University, discussed the perceptions of data mining and how companies can address data mining’s reputation.

Highlights from the interview (below) included:

  • What do consumers think data mining entails? “Data mining almost intuitively for most consumers implies scavenging through the data, trying to find secrets that you don’t necessarily want people to know,” Barocas said. “It’s really difficult to explain what data mining actually is. I think of it, in a sense, to be a particular form of machine learning. And these are complicated things — very, very complicated. A challenge for people in the industry, regulators, and anyone else interested in these issues, is to figure out a way to communicate these technical things to a lay audience.” [Discussed at the 0:41 mark.]
  • Do we need a different phrase in lieu of “data-mining”? Barocas argued: “[We should] try to push back against the misuses of the term, re-appropriate the term data mining, and explain it’s not ‘data-dredging.’ It’s not this case of running through everyone’s data. We need to instead explain data mining is a kind of analysis that lets us discover interesting and important new trends. I think there’s an enormous amount of value in data mining and being able to explain precisely what that value is without making it seem like it’s just snooping.” [Discussed at 1:12.]
  • What “ethical red flags” should companies and data scientists be aware of? “There are potential problems all along the line,” said Barocas, as after all, it can be difficult for companies performing analysis to know what to collect and what not to collect. “The rule of thumb: If your practice was made public — widely public — would you be nervous?” Barocas said he realizes that’s “not a very sophisticated rule,” but it’s one that might guide responsibility in the data mining space. [Discussed at 2:50.]

The full interview is available in the video below:

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Some quotes from this interview were edited and condensed for clarity.


tags: , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.