Putting Online Privacy in Perspective

When I wrote last week about the Facebook privacy flap, I was speaking out of the frustration that many technologists with a sense of perspective feel when we see uninformed media hysteria about the impact of new technology. (How many of you remember all the scare stories about the risks of using a credit card online from back in the mid-1990s, all of them ignoring the risks that consumers blithely took for granted in the offline world?)

Search engine expert Danny Sullivan vented some of this frustration on a private mailing list the other day. He gave me permission to reprint his remarks here. Danny was responding to a discussion of a Washington Post story about online privacy that started out with concerns about how information posted online is routinely being discovered and used against people in legal cases. (But even then, as you’ll see, they left out a crucial part of the story.)

But then, the story goes on to link these cases with the general idea of data collection online.

In the 15 years since the World Wide Web brought the Internet to the masses, the most successful companies have been those that collect information about users and use it to sell things. Google, for instance, has confirmed that it keeps track of search queries sent from a particular IP address. (A spokesman said the company anonymizes IP addresses associated with search queries after nine months and cookies after 18 months.)

Companies are loath to talk about what information they track, but internal compliance manuals for law enforcement for Facebook, Yahoo and Microsoft reviewed by The Washington Post show that their data collection is much more extensive than users might believe based on what they themselves can access.

For example: Microsoft tracks the Xbox LIVE start and end dates and times for game-playing and notes the game played, such as “SW: Jedi Academy.” Yahoo keeps chat and instant messenger logs for 45 to 60 days and notes the time/date and IP address for when content is added or deleted to someone’s profile or to its Flickr photo service.

Facebook’s data collection is among the most detailed.

For every user id, Facebook keeps a log of the IP address that accessed the account, the date and time, and what exactly the user did — clicking on an advertisement, looking at someone else’s profile, posting a photo or sending a message to a friend, etc.

The problem with linking these two ideas is that the kind of data in the examples above is exactly the kind of data online companies need to collect in order to manage and improve their services. They are a lot like the data collected by your car – some of which, like your speed, is reported to you, and much of which is only reported to a mechanic via a diagnostic computer. That this kind of data is collected is not only no surprise to computer professionals, it’s taught as basic practice!

Danny was particularly put off by the hysteria about well-known facts, and by the scrutiny given to trivial pieces of online data collection while ignoring far more massive collection of data by more familiar means. He wrote:

Heh. Google has confirmed it tracks queries to a particular IP address. Like
this wasn’t something we knew for any search engine back in say, 1995. Or as
if Google ever made a secret of it. Or more to the point, like tracking to
an IP address is the issue versus the bigger issue of people having search
histories (if people opt in) linked to real, personally identifiable
accounts.

Heaven help us, though — let’s keep talking IP addresses and cookies. And
let’s ignore the fact that in virtually every court case where search
queries have been notable as evidence, those queries were obtained … wait
for it … off the person’s own computer. Dude, when you’re searching for
ways to kill your wife, clear your browser history. Seriously, sad but true
story.

I think the internet companies are indeed going to face more scrutiny,
because they are big fat targets for lazy legislators who are loathe to
provide some real security over, I dunno, my credit card purchases?

I mean, can you imagine if when using Google and Yahoo and Bing, they
reported all your searches to a “search bureau” that was pretty easy for
anyone to access? Oh, and if you disagreed with something listed, well, good
luck with getting that removed. But we tolerate that bull from our credit
card companies.

My credit card company knows everything I’ve purchased, which is a pretty
personal trail. That doesn’t get “anonymized” after 9 months or 18 months. I
have no idea at all what happens to it. I can’t, like at Google, push a
button and make it go poof, either. I don’t think I have any rights over it
at all.

My grocery store knows all the things I’ve purchased using my store discount
card — no idea who they hand that out to.

My telephone company keeps my phone records for I don’t know how long.
Imagine that. They know who I called and for how long.

But yeah, thank you Washington Post for focusing on the fact that Xbox Live
keeps track of when I began and ended my game playing. Yeah, thanks for
spending time talking about IP addresses. Could they have shoved even one
paragraph of perspective in there? Could we get one of the privacy groups to
maybe call for some better national standards protecting user information on
and OFFline? If they are, I never hear the offline part.

Rant over. I’ve just seen this same obsession with IP addresses over years.
Years and years, rather than focusing on the bigger and more important
privacy issues on a broader perspective.

There are real privacy issues to be faced in the data collected by web companies. But they are part of a far bigger picture of how the world is changing. We need thoughtful understanding of what the real risks are, not finger pointing by the media (and even more frighteningly, by members of Congress) at companies that are easy targets because they make good political theater.