Strata Week: Our phones are giving us away

Anonymized phone data isn't as anonymous as we thought, a CFPB API, and NYC's "geek squad of civic-minded number-crunchers."

Mobile phone mobility traces ID users with only four data points

A study published this week by Scientific Reports, Unique in the Crowd: The privacy bounds of human mobility, shows that the location data in mobile phones is posing an anonymity risk. Jason Palmer reported at the BBC that researchers at MIT and the Catholic University of Louvain reviewed data from 15 months’ worth of phone records for 1.5 million people and were able to identify “mobility traces,” or “evident paths of each mobile phone,” using only four locations and times to positively identify a particular user. Yves-Alexandre de Montjoye, the study’s lead author, told Palmer that “[t]he way we move and the behaviour is so unique that four points are enough to identify 95% of people.”

Privacy International technologist Sam Smith told Palmer that “[a]ny benefits we receive from [location-based mobile phone] services are far outweighed by the threat that these trends pose to our privacy,” but de Montjoye disagreed: “We really don’t think that we should stop collecting or using this data — there’s way too much to gain for all of us — companies, scientists, and users. … We’ve really tried hard to not frame this as a ‘Big Brother’ situation … But we show that even if there’s no name or email address, it can still be personal data, so we need it to be treated accordingly.”

David Meyer at GigaOm reported on the study as well, noting that there isn’t an easy answer — we can’t have location-based services without providing location data, we want companies to gather data to help fight real-world problems like disease and poverty, and as much as consumers “insist they want better data privacy,” they “still give up their data without much consideration.” Meyer suggested taking a more pragmatic approach to data privacy:

“What we need is a new realpolitik for data privacy. We are not going to stop all this data collection, so we need to develop workable guidelines for protecting people. Those developing data-centric products also have to start thinking responsibly — and so do the privacy brigade. Neither camp will entirely get its way: there will be greater regulation of data privacy, one way or another, but the masses will also not be rising up against the data barons anytime soon.”

Meyer said his fear is that useful regulation will only come as a reactive measure to some horrible event that necessitates it. You can read his full report at GigaOm.

The CFPB releases its data as an API

The Consumer Financial Protection Bureau (CFPB) announced this week it’s expanding its Consumer Complaint Database to include 90,000 searchable financial complaints, making it “the largest public database of federal consumer financial complaints” in the U.S., according to the press release.

In a blog post at the CFPB website, Scott Pluta outlined the additional data areas, including mortgage complaints, complaints regarding bank accounts and services, private student loan complaints, consumer loan complaints, and more specific information about the product in each complaint — such as identifying a “mortgage” as a “reverse mortgage” or a “conventional mortgage” to provide more accurate data. Pluta also noted that more expansions are coming, but that users don’t have to wait on the CFPB:

“You don’t have to wait for us to build what you’d like to see from the data. We’re releasing this data as an API, as well as in CSV, JSON, PDF, RDF, RSS, XLS, XLSX, and XML — and we’d love to see what you can do with it.”

Scott says the CFPB is encouraging public users, “including consumers, analysts, data scientists, civic hackers, and companies that serve consumers, to analyze, augment, and build on the information in the database to develop ways for consumers to use the complaint data or mash it up with other public data sets to reveal potential trends.” Additionally, the press release pointed out that the live database, which has more than one million data points covering about 450 companies, is updated daily, so as more complaints are processed, more will be added to the database. The press release also requests that users who build something with the data or those who come upon others innovating with it to highlight the work by tweeting @CFPB using the #CFPBdata hashtag.

NYC geek squad

Alan Feuer profiled New York City’s Office of Policy and Strategic Planning — what he calls “a geek squad of civic-minded number-crunchers” — in the New York Times this week. He summarized some of the the group’s success since its launch:

“For the modest sum of $1 million, and at a moment when decreasing budgets have required increased efficiency, the in-house geek squad has over the last three years leveraged the power of computers to double the city’s hit rate in finding stores selling bootleg cigarettes; sped the removal of trees destroyed by Hurricane Sandy; and helped steer overburdened housing inspectors — working with more than 20,000 options — directly to lawbreaking buildings where catastrophic fires were likeliest to occur.”

Feuer highlighted Michael Flowers’ journey to become head of the group — via an assignment in Iraq, where his team was protected by military officers using predictive analytics to determine where and when bomb were likely to explode — and looked at some of the work from the team — the discovery that it’s “mathematically possible to create safer streets by encouraging local businesses to keep their doors open later after dark,” and that they process a terabyte of raw data daily.

Looking at future projects, Feuer noted, Flowers is interested in mining the city’s social media data — it’s 250,000 New York-centric tweets, for instance. “If Young & Rubicam can use tweets to sell you stuff,” he asked in a meeting Feuer attended, “why can’t the city use them to make you less sick?” You can read Feuer’s full feature piece at the New York Times.

Flowers also talked in depth on the work of the Office of Policy and Strategic Planning at the recent Strata NY conference. “The office is pretty much designed around trying to figure out how to take what we know about our locations, and our businesses and our people and turn it into something useable,” he said in a keynote address. You can view Flowers’ full keynote in the following video:

Tip us off

News tips and suggestions are always welcome, so please send them along.

Related:

O’Reilly
Strata Conference
— Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.Strata Rx Health Data Conference: September 25-27 | Boston, MA
Strata + Hadoop World: October 28-30 | New York, NY
Strata in London: November 15-17 | London, England
tags: , , , ,