What does privacy mean in an age of big data?

Author Terence Craig on why data transparency trumps anonymization.

As we do more online — shop, browse, chat, check in, “like” — it’s clear that we’re leaving behind an immense trail of data about ourselves. Safeguards offer some level of protection, but technology can always be cracked and the goals of data aggregators can shift. So if digital data is and always will be a moving target, how does that shape our expectations for privacy? Terence Craig (@terencecraig), co-author of “Privacy and Big Data,” examines this question and related issues in the following interview.

Your book argues that by focusing on how advertisers are using our data, we might be missing some of the bigger picture. What are we missing, specifically?

Terence CraigTerence Craig: One of the things I tell people is I really don’t care if companies get more efficient at selling me soap. What I do care about is the amount of information that is being aggregated to sell me soap and what uses that data might be put toward in the future.

One of the points that co-author Mary Ludloff and I tried to make in the book is that the reasons behind data collection have nothing to do with how that data will eventually be used. There’s way too much attention being paid to “intrusions of privacy” as opposed to the problem that once data is out there, it’s out there. And potentially, it’s out there as long as electronic civilization exists. How that data will be used is anybody’s guess.

What’s your take on the promise of anonymity often associated with data collection?

Terence Craig: It’s fundamentally irresponsible for anyone who collects data to claim they can anonymize that data. We’ve seen the Netflix de-anonymization, the AOL search release, and others. There’s been several cases where medical data has been released for laudatory goals, but that data has been de-anonymized rather quickly. For example, the Electronic Frontier Foundation has a piece that explains how a researcher was able to connect an anonymized medical record to former Massachusetts governor William Weld. And in relation to that, a Harvard genome project tries to make sure people understand the privacy risks of participating.

If we assume that companies have good will toward their consumers’ data — and I’ll assume that most large corporations do — these companies can still be hacked. They can be taken advantage of by bad employees. They can be required by governments to provide backdoors into their systems. Ultimately, all of this is risky for consumers.

Assuming that data can’t be anonymized and companies don’t have malicious plans for our personal data, what expectations can we have for privacy?

Terence Craig: We’ve moved back to our evolutionary default for privacy, which is essentially none. Hunter-gatherers didn’t have privacy. In small rural villages with shared huts between multi-generational families, privacy just wasn’t really available there.

The question is how do we address a society that mirrors our beginnings, but comes with one big difference? Before, anyone who knew the intimate details of our lives were people we had met physically, and they were often related to us. But now the geographical boundary has been erased by the Internet, so what does that mean? And how are we as a society going to evolve to deal with that?

With that in mind, I’ve given up on the idea of digital privacy as a goal. I think you have to if you want to reap the rewards of being a full participant in a digitized society. What’s important is for us to make sure we have transparency from the large institutions that are aggregating data. We need these institutions to understand what they’re doing with data and to share that with people so we, in aggregate, can agree whether or not this is a legitimate use of our data. We need transparency so that we — consumers, citizens — can start to control the process. Transparency is what’s important. The idea that we can keep the data hidden or private, well … that horse has left the stable.

What’s the role of governments here, both in terms of the data they keep but also the laws they pass about data?

Terence Craig: Basically anything the government collects, I believe should be made available. After all, governments are some of the largest aggregators of data from all sorts of people. They either purchase it or they demand it for security needs from primary collectors like Google, Facebook, and the cell phone companies — the millions of requests law enforcement agencies sent to Sprint in 2008-2009 was a big story we mentioned in the book. So, it’s important that governments reveal what they’re doing with this information.

Obviously, there’s got to be a balance between transparency and operational security needs. What I want is to have a general idea of: “Here’s what we — the government — are doing with all of the data. Here’s all of the data we’ve collected through various means. Here’s what we’re doing with it. Is that okay?” That’s the sort of legislation I would like, but you don’t see that anywhere at this point.

This interview was edited and condensed.

Privacy and Big Data — This book introduces you to the players in the personal data game, and explains the stark differences in how the U.S., Europe, and the rest of the world approach the privacy issue.


tags: , , , ,