Is data collection entering discriminatory territory? Also, big data's role in crime fighting and its debut in the NBA.
Data mining opens new doors for discrimination, marginalization
“For most of the Internet’s short history, the primary goal of this data collection was classic product marketing: for example, advertisers might want to show me Nikes and my wife Manolo Blahniks. But increasingly, data collection is leapfrogging well beyond strict advertising and enabling insurance, medical and other companies to benefit from analyzing your personal, highly detailed ‘Big Data’ record without your knowledge. Based on this analysis, these companies then make decisions about you — including whether you are even worth marketing to at all.”
The consequences of such detailed data mining run deep. Fertik notes that advances in online data mining are enabling companies to “skirt the spirit of the law” and make discriminatory choices in who receives credit or loan offers, for example, by simply not displaying online offers to less credit-attractive users. “If you live on the wrong side of the digital tracks,” he says, “you won’t even see a credit offer from leading lending institutions, and you won’t realize that loans are available to help you with your current personal or professional priorities.”
Big data's role in the US presidential election, trends shaping the future of data, and extenuating consequences of the Megaupload case.
Here are a few stories from the data space that caught my attention this week.
Big data, big politics
In the aftermath of the US presidential election, much attention has been focused on Nate Silver’s art of predicting the election results with data. Some looked at it from a coverage angle and how Silver’s work in the spotlight will affect the process of covering elections in the future. John McDermott reports at AdAge that Silver’s work will help shift the “nebulous aspects” of reporting that focus on “feel” and “momentum” to reporting that is anchored in facts and statistics. ComScore analyst Andrew Lipsman said to McDermott, “Now that people have seen [statistics-driven political analysis] proven over a couple of cycles, people will be more grounded in the numbers.”
Which also shows the attention Silver attracted may serve to help democratize big data as well. Tarun Wadhwa reports at Forbes that the power of big data has finally been realized in the US political process:
“Beyond just personal vindication, Silver has proven to the public the power of Big Data in transforming our electoral process. We already rely on statistical models to do everything from flying our airplanes to predicting the weather. This serves as yet another example of computers showing their ability to be better at handling the unknown than loud-talking experts. By winning ‘the nerdiest election in the history of the American Republic,’ Barack Obama has cemented the role of Big Data in every aspect of the campaigning process. His ultimate success came from the work of historic get-out-the-vote efforts dominated by targeted messaging and digital behavioral tracking.”
Michael Scherer at Time has an in-depth look at the role big data and data mining played in Obama’s campaign as well. Campaign manager Jim Messina, Scherer writes, “promised a totally different, metric-driven kind of campaign in which politics was the goal but political instincts might not be the means” and hired dozens of data crunchers to establish an analytics department. The team put together a massive database that merged information from all areas of the campaign — social media, pollsters, consumer databases, fundraisers, etc. — into one central location. Scherer reports: “The new megafile didn’t just tell the campaign how to find voters and get their attention; it also allowed the number crunchers to run tests predicting which types of people would be persuaded by certain kinds of appeals.”
Scherer’s piece is a fascinating look at how data was put to use in a successful presidential campaign. It is this week’s recommended read.
Wikidata's structure vs. diverse knowledge, and a look at the many factors behind Netflix's recommendations.
A critic says Wikidata could undermine Wikipedia's localized information. Also, Netflix explains why its recommendation engine is much more complicated than most people realize.
The 1940 census makes its data debut, and the White House shows off its data initiative.
In this week's data news, the National Archives releases the data from the 1940 Census, the federal government outlines its big data plans, and an app uproar leads to good thinking on privacy and sharing.
Wikileaks and Sealand may not be a good match, ThinkUp reboots, Factual's CEO gets the NYT's attention.
In this week's data news, a look at Sealand as a potential data haven for Wikileaks, ThinkUp reboots, and the New York Times profiles Factual's Gil Elbaz.
A new infographic tool, San Francisco upgrades its open data efforts, and decades of Stephen Wolfram's data.
Visual.ly launches an infographic creation tool, San Francisco upgrades its open data initiative, and Stephen Wolfram offers a peek into more than 20 years of his personal data.
The work of data journalists and a comparison of four data markets.
This week's data news includes a look at the work of various data journalists, Edd Dumbill surveys four data marketplaces, and the MIT Sloan Sports Analytics Conference experiences impressive growth.
Datasift offers more access to the Twitter archive, and a proposal for a data school.
In this week's data news, Datasift will offer deeper access to old tweets, P2PU and the Open Knowledge Foundation announce a School of Data.