Additional iPhone tracking research

Researchers and reporters are exploring many of the issues related to mobile location data.

Update, 4/27/11 — Apple has posted a response to questions raised in this report and others.

By Alasdair Allan and Pete Warden

Here’s the latest developments on iPhone tracking.

Android records a short log

The Guardian has a good overview of Android’s equivalent to consolidated.db. It records the last 50 cell locations, and the last 200 Wi-Fi networks, but older entries are overwritten. As we mentioned in our original video, this was what we expected on the iPhone when we found the file, and it was the sheer scale and duration of the recording that floored us, along with how easy it was to access on your computer. Android doesn’t appear to copy the file over when you sync, so you’d need physical access to the phone to read it.

Phoning home your location

In the Wall Street Journal there’s a good story covering how phones often send your location back to servers at both Apple and Google. We’ve known that cell companies are gathering this kind of data, because they need it for their basic operations, but the most interesting question for me is how it’s actually stored by these software companies. If it’s truly just for improving their location services, it could be anonymized so that it would be hard to figure out an individual’s movements if you had the data. Even if it’s not, the data is somewhat protected when it’s on a company’s internal network, since that keeps it further out of reach than a file that’s held on your machine.

Better for tracking travel than home or office locations

Sean Gorman and my friend Peter Batty have done some impressive work digging into the details of the location data. Their conclusion is that it’s hard to spot locations where you spend a lot of time in the same place, like your house or place of work. It’s almost as if re-visiting the same spot overwrites a lot of the older data for that place, which would fit with a lot of what we’ve seen. They also try to quantify the accuracy of the location, pointing out how many outliers appear.

Even just showing where you’ve been traveling to is pretty concerning, but it’s good to rule out some malicious uses. The work they’ve done gives us a lot more about the characteristics of the data, I’m looking forward to seeing more of this kind of analysis.

Intriguingly, their work also has some support for Will Clarke’s idea that the locations are associated with cell towers. Peter’s data shows a cluster around Mile High Stadium, which he hasn’t visited recently but which does have a lot of cell infrastructure. Sean has another map that overlays actual tower locations with his points, and it’s clear they don’t coincide, but could well be triangulated from multiple towers. Sean’s observation fits with our initial hypothesis that the locations are the result of sometimes-inaccurate triangulation from towers, but Peter’s is evidence that there’s a bias in the data to clustering around tower positions.

Peter is investigating the WiFiLocation table. This typically contains a lot more points than the cell version, with 219,000 entries in Alasdair’s data versus only 29,000 cell points. We didn’t visualize this in the application because the derived lat/long points are a lot noisier, but that may be an issue with the quality of the location-lookup tables Apple are using since they switched away from SkyHook. It appears to record the ID of many of the WiFi networks you’ve come into range of, so I’ll be interested to see what Peter and others discover about this data.


tags: , , ,