Strata Week: A call for open science data

Science needs to open up, the murky ownership of UK train data, hacking a Texas town.

Here are some of the data stories that caught my attention this week:

Should scientists share their research data more openly?

Royal Society logoLondon’s Royal Society has launched a study, “Science as a public enterprise,” which will examine “how scientific information should be managed to support innovative and productive research that reflects public values.”

That statement points to two key ideas underlying the Royal Society’s inquiry. First is the importance of public values and public trust in science. No longer can scientists just assume that people will defer to their authority, as the debates over climate change have demonstrated:

It is therefore important that science is not, and is not seen to be, a private enterprise, conducted behind the closed doors of laboratories, but a public enterprise to understand better the world we live in and our place in it. Effective dialogue about the priorities and insights of science and its relation to public values is vital. Scientists can no longer assume an unquestioning public trust.

The other aspect of the Royal Society inquiry involves reconsidering how science is practiced, particularly vis-à-vis open data. In an article in The Lancet the members of the committee contend that scientific scholarship needs to do a better job making data available. “Conventional peer-reviewed publications generally provide summaries of the available data, but not effective access to data in a useable format.” Although there are calls to make data available to others, at the same time the exponential growth in the volume and diversity of data makes accessibility a challenge.

In addition to how scientists can make this data more available are questions about who should pay to do it; how scientists will handle the need for confidentiality, data security, intellectual property rights, and anonymization; and whether rules on this sort of scientific data sharing could apply globally.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

Why is UK train departure data not open data?

Bewdley Station by Dazzie D, on FlickrDespite the British government’s efforts to open its data, much is still unavailable. The UK-based location data startup Placr has written a blog post explaining why the country’s rail data doesn’t contain train departure information. The explanation, penned by Placr co-founder Jonathan Raper, demonstrates how complex open data efforts can be in terms of technology, bureaucracy, data ownership and control.

The short answer, says Raper, is that the Association of Train Operating Companies — the only group with a train departure information service and API — is a private organization that doesn’t release open data. The API is available, however, with a commercial license and there are some free licenses distributed.

The British rail system was privatized in the mid-1990s but it remains heavily subsidized by taxpayers. There’s now some confusion, Raper suggests, about when and if Network Rail (the rail infrastructure owner) counts as a public sector organization, and in the case of rail data, who exactly owns it.

Hacking Tyler, Texas

Christopher Groskopf is back in The Atlantic with his second blog post about his Hack Tyler project. Groskopf is relocating to Tyler, Texas and is making the most of the move by focusing his developer efforts on the sizable amount of open data made available by Tyler’s local government. Groskopf says he had no idea that his idea would spark “an unexpected ruckus” from online readers and from Tyler residents.

Groskopf has already created a list of all the data sources he’s been able to identify. “Its been heartening to see how much data actually is available (albeit often in less than ideal formats),” he writes. Some of this data includes a real-time list of where the city’s police officers are responding, and almost all of Smith County’s financial documentation.

You can follow the adventures of Hack Tyler here.

A little light reading on MapReduce and Hadoop

If you’re looking to brush up on MapReduce and Hadoop algorithms in your summer reading, then check out this updated list of academic papers. The list includes 35 papers published this year, as well as two new categories: social networking and astronomy.

Got data news?

Feel free to email me.

Photo: Bewdley Station by Dazzie D, on Flickr


tags: ,