Data News: Week in Review

Tracking data found in iOS 4, crowdsourcing is questioned, and the Senate doesn't get "open data"

The Where 2.0 Conference was held April 19 – 21 in Santa Clara, Calif., so it’s no surprise there were plenty of location-based developments to talk about this week in the data space. Here are a few of the data stories — place-based and otherwise — that caught my eye.

Your iPhone tracks your location

iPhone trackOn Wednesday, Pete Warden and Alasdair Allan made headlines with the story of their discovery of an iPhone file that tracks its owner’s location. The iPhone appears to use cell-tower triangulation to periodically record user’s latitude and longitude, storing the data in a file that lives on the iPhone and is transferred to a user’s computer when the device is synced.

According to their research, the file appears to be part of iOS 4 update, as that’s the point from which the recordings start. While the existence of the file raises some questions — what are Apple’s plans for this data — more disconcerting may be that the file is unencrypted, leaving this trove of location data stored locally but unprotected. Apple doesn’t transmit the data, it appears, but no other device seems to have a comparable file, according to Warden and Allan.

While there are questions about privacy and security here, the data is quite compelling, thanks in no small part to the iPhone Tracker tool Warden and Allan have built that will read this file on a user’s computer and visualize their movements. Your phone has surreptitiously been tracking you, but the maps replay a fascinating and fairly accurate record of where you’ve travelled since June 2010.

Crowdsourced data versus “real statistics”

Ushahidi co-founder Eric Hersman wrote a strong defense of crowdsourced data this week in his post, “The Immediacy of the Crowd.” His blog post served as a response to one that appeared last month on the social enterprise organization Benetech’s blog. The title of the latter post — “Crowdsourced data is not a substitute for real statistics” — probably demonstrates immediately why Ushahidi would object.

The Benetech post (along with a subsequent Fast Company article) suggests that crowdsourced data from mobile phones and SMS can “lead rescue teams in the wrong direction” and that that data might not be good for statistical analysis or modeling.

On one hand, this is an interesting and important academic debate here. Which is better, crowdsourced data or statistical patterns? Are there patterns in crowdsourced data that we can use, in aggregate or as predictions in real time?

But the back and forth between the blogs, as Hersman observes in his post, overlooks an important element: Crisis response is messy and hardly a “clinical environment where we all get to sit back, sift data and take our time to make a decision.”

U.S. Senate finally releases its financial data … in PDF

It’s been almost two years since the U.S. Senate agreed to make the official record of its expenditures publicly available online. This week the Senate finally revealed its plan to release the information. According to the Sunlight Foundation, the Senate will begin to release records in November. This will cover the period from April to September.

But the data will be in PDF format. As the Sunlight Foundation notes with dismay:,

The legislation was rather clearly intended to create the release of actual data, not data in the difficult-to-reuse form of a paper document. Unfortunately, PDF documents can meet the standard of searchable (as long as the text is exposed), and itemized (if the items are listed), so the Senate is getting by on a technicality, and reaching for the lowest common denominator.

How do we demand more accessible, structured datasets? Or, how do we challenge the PDF?

Got data news?

Feel free to email me.


tags: , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • Alex Tolley

    Or, how do we challenge the PDF?

    Surely some one can manipulate tee PDF file and generate a friendlier, user searchable version, and post it?