Strata Week: New life for an old census

Here are a few of the data stories that caught my attention this week

Now available in digital form: The 1940 census

The National Archives released the 1940 U.S. Census records on Monday, after a mandatory 72-year waiting period. The release marks the single largest collection of digital information ever made available online by the agency.

Screenshot from the digital version of the 1940 Census.

The 1940 Census, conducted as a door-to-door survey, included questions about age, race, occupation, employment status, income, and participation in New Deal programs — all important (and intriguing) following the previous decade’s Great Depression. One data point: in 1940, there were 5.1 million farmers. According to the 2010 American Community Survey (not the census, mind you), there were just 613,000.

The ability to glean these sorts of insights proved to be far more compelling than the National Archives anticipated, and the website hosting the data, Archives.com, was temporarily brought down by the traffic load. The site is now up, so anyone can investigate the records of approximately 132 million Americans. The records are searchable by map — or rather, “the appropriate enumeration district” — but not by name.

A federal plan for big data

The Obama administration unveiled its “Big Data Research and Development Initiative” late last week, with more than $200 million in financial commitments. Among the White House’s goals: to “advance state-of-the-art core technologies needed to to collect, store, preserve, manage, analyze, and share huge quantities of data.”

The new big data initiative was announced with a number of departments and agencies already on board with specific plans, including grant opportunities from the Department of Defense and National Science Foundation, new spending on an XDATA program by DARPA to build new computational tools as well as open data initiatives, such as the the 1000 Genomes Project.

“In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use big data for scientific discovery, environmental and biomedical research, education, and national security,” said Dr. John P. Holdren, assistant to the President and director of the White House Office of Science and Technology Policy in the official press release (PDF).

Personal data and social context

When the Girls Around Me app was released, using data from Foursquare and Facebook to notify users when there were females nearby, many commentators called it creepy. “Girls Around Me is the perfect complement to any pick-up strategy,” the app’s website once touted. “And with millions of chicks checking in daily, there’s never been a better time to be on the hunt.”

“Hunt” is an interesting choice of words here, and the Cult of Mac, among other blogs, asked if the app was encouraging stalking. Outcry about the app prompted Foursquare to yank the app’s API access, and the app’s developers later pulled the app voluntarily from the App Store.

Many of the responses to the app raised issues about privacy and user data, and questioned whether women in particular should be extra cautious about sharing their information with social networks. But as Amit Runchal writes in TechCrunch, this response blames the victims:

“You may argue, the women signed up to be a part of this when they signed up to be on Facebook. No. What they signed up for was to be on Facebook. Our identities change depending on our context, no matter what permissions we have given to the Big Blue Eye. Denying us the right to this creates victims who then get blamed for it. ‘Well,’ they say, ‘you shouldn’t have been on Facebook if you didn’t want to …’ No. Please recognize them as a person. Please recognize what that means.”

Writing here at Radar, Mike Loukides expands on some of these issues, noting that the questions are always about data and social context:

“It’s useful to imagine the same software with a slightly different configuration. Girls Around Me has undeniably crossed a line. But what if, instead of finding women, the app was Hackers Around Me? That might be borderline creepy, but most people could live with it, and it might even lead to some wonderful impromptu hackathons. EMTs Around Me could save lives. I doubt that you’d need to change a single line of code to implement either of these apps, just some search strings. The problem isn’t the software itself, nor is it the victims, but what happens when you move data from one context into another. Moving data about EMTs into context where EMTs are needed is socially acceptable; moving data into a context that facilitates stalking isn’t acceptable, and shouldn’t be.”

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O’Reilly Fluent Conference (May 29 – 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.

Related:

Strata Week: New life for an old census

The 1940 census makes its data debut, and the White House shows off its data initiative.

Now available in digital form: The 1940 census

A federal plan for big data

Personal data and social context

Got data news?

Get the O’Reilly Data Newsletter