ENTRIES TAGGED "data ethics"
Kate Crawford argues for caution and care in data-driven decision making.
Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, and hammered home the constant need for context in conclusions. Video of her talk is embedded below:
Crawford explored many of these same topics in our interview, which follows.
The battle to open source OFA code; a student hacker uncovers security flaw, gets expelled; and ethics and taxes for user data collection.
A cloudy future for Obama’s election code
A battle is brewing between politicians and the dream team of programmers that helped Obama win the nerdiest election ever. Ben Popper reports at The Verge that the programmers who worked on the Obama for America (OFA) 2012 campaign want to open source the code behind the campaign’s website, its donation collection and email systems, and its mobile app. Yet “[t]hree months after the election, the data and software is still tightly controlled by the president and his campaign staff, with the fate of the code still largely undecided,” Popper writes.
OFA’s director of front-engineering Daniel Ryan told Popper that he believes the Democratic National Committee (DNC) will “mothball” the tech and argues that it should be open because it was built on top of open source code and, therefore, should go back to the public. Popper also notes that if the DNC keeps the code on ice until the 2016 election, it will be useless. “But if our work was open and people were forking it and improving it all the time,” Ryan told Popper, “then it keeps up with changes as we go.” Ryan also points out that not opening up the code not only would stifle development for the next election, but would also hinder opportunities for other progressive organizations to build on the code in the next four years.
Popper reports that a DNC official responded to a request for comment, stating that “OFA is still working out the future of their tech and data infrastructure so any speculation at this time is premature and uninformed.” You can read Popper’s in-depth report at The Verge.
Here are a few stories from the data space that caught my attention this week.
Presidential candidates are mining your data
Data is playing an unprecedented role in the US presidential election this year. The two presidential campaigns have access to personal voter data “at a scale never before imagined,” reports Charles Duhigg at the New York Times. The candidate camps are using personal data in polling calls, accessing such details as “whether voters may have visited pornography Web sites, have homes in foreclosure, are more prone to drink Michelob Ultra than Corona or have gay friends or enjoy expensive vacations,” Duhigg writes. He reports that both campaigns emphasized they were committed to protecting voter privacy, but notes:
“Officials for both campaigns acknowledge that many of their consultants and vendors draw data from an array of sources — including some the campaigns themselves have not fully scrutinized.”
A Romney campaign official told Duhigg: “You don’t want your analytical efforts to be obvious because voters get creeped out. A lot of what we’re doing is behind the scenes.”
The “behind the scenes” may be enough in itself to creep people out. These sorts of situations are starting to tarnish the image of the consumer data-mining industry, and a Manhattan trade group, the Direct Marketing Association, is launching a public relations campaign — the “Data-Driven Marketing Institute” — to smooth things over before government regulators get involved. Natasha Singer reports at the New York Times:
“According to a statement, the trade group intends to promote such targeted marketing to lawmakers and the public ‘with the goal of preventing needless regulation or enforcement that could severely hamper consumer marketing and stifle innovation’ as well as ‘tamping down unfavorable media attention.’ As part of the campaign, the group plans to finance academic research into the industry’s economic impact, said Linda A. Woolley, the acting chief executive of the Direct Marketing Association.”
One of the biggest issues, Singer notes, is that people want control over their data. Chuck Teller, founder of Catalog Choice, told Singer that in a recent survey conducted by his company, 67% of people responded that they wanted to see the data collected about them by data brokers and 78% said they wanted the ability to opt out of the sale and distribution of that data.
The biggest threat that a data-driven world presents is an ethical one.
Since the first of our ancestors chipped stone into weapon, technology has divided us. Seldom more than today, however: a connected, always-on society promises health, wisdom, and efficiency even as it threatens an end to privacy and the rise of prejudice masked as science.
On its surface, a data-driven society is more transparent, and makes better uses of its resources. By connecting human knowledge, and mining it for insights, we can pinpoint problems before they become disasters, warding off disease and shining the harsh light of data on injustice and corruption. Data is making cities smarter, watering the grass roots, and improving the way we teach.
But for every accolade, there’s a cautionary tale. It’s easy to forget that data is merely a tool, and in the wrong hands, that tool can do powerful wrong. Data erodes our privacy. It predicts us, often with unerring accuracy — and treating those predictions as fact is a new, insidious form of prejudice. And it can collect the chaff of our digital lives, harvesting a picture of us we may not want others to know.
The big data movement isn’t just about knowing more things. It’s about a fundamental shift from scarcity to abundance. Most markets are defined by scarcity — the price of diamonds, or oil, or music. But when things become so cheap they’re nearly free, a funny thing happens.
Consider the advent of steam power. Economist Stanley Jevons, in what’s known as Jevons’ Paradox, observed that as the efficiency of steam engines increased, coal consumption went up. That’s not what was supposed to happen. Jevons realized that abundance creates new ways of using something. As steam became cheap, we found new ways of using it, which created demand.
The same thing is happening with data. A report that took a month to run is now just a few taps on a tablet. An unthinkably complex analysis of competitors is now a Google search. And the global distribution of multimedia content that once required a broadcast license is now an upload. Read more…
The future of desktops, ethics and big data, narrative vs spreadsheets.
This week on O'Reilly: Josh Marinacci predicted that 90% of computer users will rely on mobile, but 10% will still need desktops; the authors of "Ethics of Big Data" explored data's trickiest issues; and Narrative Science CTO Kris Hammond discussed narrative's role in data analytics.
Be aware of the just-so data stories that sound reasonable but cannot be conclusively proven.
Bradley Voytek: "Our goal as data scientists should be to distill the essence of the data into something that tells as true a story as possible while being as simple as possible to understand."
Solon Barocas on data mining's reputation and the ethics of data collection.
Solon Barocas, a doctoral student at New York University, discusses consumer perceptions of data mining and how companies and data scientists can shape data mining's reputation.