Andy Oram

European Union starts project about economic effects of open government data

by Andy Oram@praxagoraComments: 111 June 2010

Earlier this week I talked to writer and open source advocate Marco Fioretti, who has just announced the start of a study on open data for the European Union. Fioretti is a long-time supporter of open source software, which he wrote about in a chapter of the O'Reilly book Open Government. Fioretti also held a seminar about open and prorietary formats at Pisa's Sant'Anna School of Advanced Studies, a major European college in the field of economics.

Several problems impelled Fioretti to propose this study:

  • Government claims are hard to verify. When the cost of the huge Strait of Messina Bridge project is announced, for instance, how can the public determine whether it's reasonable? (And why, I might add, do most projects experience cost overruns but none ever come in under budget?)
  • Lots of value is hidden away in government data. When data is released without fees or restrictions on use, businesses tend to spring up to exploit that data. Maps are one obvious example.
  • Information about open data is scattered. According to Fioretti, 70% of public data and online services in Italy are provided by local governments. Decentralization increases the effort required to collect data about expenditures, and shows that many important decision are taken on a city-by-city basis rather than at a national level.
  • As Fioretti points out in a report from a conference, one can't really anticipate how much economic value one will create by releasing data. However, he hopes to be able to quantify the value afterward, to generate an incentive to release more data.
  • Data that is not open is silo'd. For instance, two adjoining geographic regions may have map data, but no one can calculate geographic information spanning the two regions. And even if they merge their data, they often find that the combined data contains errors because of incompatbilities in storage. A river may stop at the regional border and start again on the other side a kilometer away, for instance. Such errors have to be fixed after the fact at great expense.

Releasing data that was collected for public use with public taxes is an appealing goal, but it faces innumerable hurdles. First, governments usually contract out both data collection and data analysis to private firms. Right away we're faced with the challenges of incompatible, proprietary, and even arbitrary formats, along with the firm's understandable preference to keep data to itself.

So government contracts must be very specific about the delivery of data that it commissions--and not just the data, but the formulas and software used to calculate results. For instance, if a spreadsheet was used in calculating the cost of a project, the government should release the spreadsheet data and formulas to the public in an open format so that experts can check the calculations.

On top of these barriers lie the usual difficulties of inconsistently recorded data, missing metadata such as dates and times, etc.

Fioretti hopes to shine a bit more light through all this smoke, finding out what data is being released right now and how businesses are using it. He's concentrating on local governments, first because of their importance, and second because the data will be more consistent that way. The structure of government projects and costs are more similar from one city to another--even across national EU borders--than from one national government to another.

One phase of the study will be a survey asking cities to give examples of how the release of local data has enable new business uses. For instance, he can take a region that used to sell digital road map information for thousands of dollars, but recently opened it up for free, and count the use of that data by businesses in that region before and after it was opened.

This phase concentrates on small businesses, because large ones usually can afford the fees charged for data that is not open.

Fioretti plans to use the results of this phase to demonstrate the value of his study and then launch a large follow-up phase. In that one, he'll just ask a large number of cities a few questions about which data they make open, with which licenses, and in which formats.

Knowing what data is available to the public does not in itself teach us anything about the economics of open data. But once Fioretti posts his results--in downloadable format under an open license, of course--other researchers can correlate the results with other information gathered about local businesses. So it may take several years to learn something practical, but the EU should be commended for trying to quantify the impact of what Tim O'Reilly calls government as a platform.

Two other new research projects in open government deserve publicity:

Comments: 1

Jehnavi [11 June 2010 11:35 PM]

I would say the answer lies somewhere in between. There are clear differences between open government data and open source, but there are a slew of similarities. One of the big differences is that data doesn’t do anything, code does; this is to say, that the operational ability to modify data and “patch” it locally isn’t as important in data vs. code. But that doesn’t mean that users can’t “fork” open government data, especially if an agency, etc. is too stubborn to fix flaws in data sets themselves.
http://www.onlinenotebook.com/