Strata Week: Data mining for votes

Candidates are data mining behind the scenes, data mining gets a PR campaign, Google faces privacy policy issues, and Hadoop and BI.

Here are a few stories from the data space that caught my attention this week.

Presidential candidates are mining your data

Data is playing an unprecedented role in the US presidential election this year. The two presidential campaigns have access to personal voter data “at a scale never before imagined,” reports Charles Duhigg at the New York Times. The candidate camps are using personal data in polling calls, accessing such details as “whether voters may have visited pornography Web sites, have homes in foreclosure, are more prone to drink Michelob Ultra than Corona or have gay friends or enjoy expensive vacations,” Duhigg writes. He reports that both campaigns emphasized they were committed to protecting voter privacy, but notes:

“Officials for both campaigns acknowledge that many of their consultants and vendors draw data from an array of sources — including some the campaigns themselves have not fully scrutinized.”

A Romney campaign official told Duhigg: “You don’t want your analytical efforts to be obvious because voters get creeped out. A lot of what we’re doing is behind the scenes.”

The “behind the scenes” may be enough in itself to creep people out. These sorts of situations are starting to tarnish the image of the consumer data-mining industry, and a Manhattan trade group, the Direct Marketing Association, is launching a public relations campaign — the “Data-Driven Marketing Institute” — to smooth things over before government regulators get involved. Natasha Singer reports at the New York Times:

“According to a statement, the trade group intends to promote such targeted marketing to lawmakers and the public ‘with the goal of preventing needless regulation or enforcement that could severely hamper consumer marketing and stifle innovation’ as well as ‘tamping down unfavorable media attention.’ As part of the campaign, the group plans to finance academic research into the industry’s economic impact, said Linda A. Woolley, the acting chief executive of the Direct Marketing Association.”

One of the biggest issues, Singer notes, is that people want control over their data. Chuck Teller, founder of Catalog Choice, told Singer that in a recent survey conducted by his company, 67% of people responded that they wanted to see the data collected about them by data brokers and 78% said they wanted the ability to opt out of the sale and distribution of that data.

Google faces EU privacy policy scrutiny and opens its data center doors

European privacy regulators asked Google this week to change the privacy policy it implemented in January to make it easier — or even possible — for users to opt out of personal data collection while still being able to access and use the services. Eric Pfanner and Kevin J O’Brien report at the New York Times that when Google’s new policy went into effect, it technically followed European regulations, allowing users to opt out, but the way it was set up required users to click an “I agree” button — effectively “opting in” — before using the services.

Regulators also want Google to be more transparent with the kinds of data it’s collecting and how that data is being used. Pfanner and O’Brien report:

“‘The new privacy policy allows an unprecedented combination of data across different Google services,’ Isabelle Falque-Pierrotin, chairwoman of the French data-protection authority, said at a news conference in Paris. ‘We are not opposed to this, in principle, but the data could be employed in ways that the user is not aware of.'”

Falque-Pierrotin said Google has three to four months to comply and may face fines or legal action if it refuses. Google’s global privacy counsel Peter Fleischer responded confidently. In a released statement he said: “Our new privacy policy demonstrates our long-standing commitment to protecting our users’ information and creating great products. We are confident that our privacy notices respect European law.”

In other Google news, the company opted to become very transparent this week with its data centers, adding photos from some of its data centers to the data center section of its website. The new transparency coincided with a report from Steven Levy at Wired, looking at how Google builds and operates its data centers, along with an account of his tour through the facility in Lenoir, N.C. (you can read his account here). While personal tours aren’t available for everyone, you can now take a virtual tour of the data center in Lenoir as Google has given it the Street View treatment:

Hadoop and the BI industry

Derrick Harris writes at GigaOm this week that it might be time to reevaluate what Hadoop is as it ventures deeper into business intelligence (BI). He writes:

“Is it a MapReduce framework for heavy-duty batch processing? Yes. But can it also be the engine of high-speed, interactive analytics products that look to do for unstructured data what massively parallel analytic databases do for structured data? As it turns out, the answer might be ‘yes’ again.”

Harris looks at several companies that all made BI moves in this direction this week alone: Hadapt, Birst, Splice Machine, and Teradata. He points out that we’re at the very brink of possibilities of aligning Hadoop and BI, and says we likely haven’t seen anything yet.

Study results from Gartner Research published this week seem to support Harris’ sentiment. The study looked at how big data will be driving billions of dollars in spending in the next few years. Alex Williams at TechCrunch highlighted several key points from the research, noting that the influx of big data “will force a change in products, practices and solutions,” and that “[m]aking big data something that has a functional use will drive $4.3 billion in software sales in 2012.”

Tip us off

News tips and suggestions are always welcome, so please send them along.


tags: , , , , , , ,