European Union starts project about economic effects of open government data

Earlier this week I talked to writer and open source advocate
Marco
Fioretti
, who has just announced the start of a
study on open data for the European Union
. Fioretti is a
long-time supporter of open source software, which he wrote about in a
chapter of the O’Reilly book Open Government.
Fioretti also held a seminar about open and prorietary formats at
Pisa’s Sant’Anna School of Advanced Studies, a major European college
in the field of economics.

Several problems impelled Fioretti to propose this study:

  • Government claims are hard to verify. When the cost of the huge Strait
    of Messina Bridge project is announced, for instance, how can the
    public determine whether it’s reasonable? (And why, I might add, do
    most projects experience cost overruns but none ever come in under
    budget?)

  • Lots of value is hidden away in government data. When data is released
    without fees or restrictions on use, businesses tend to spring up to
    exploit that data. Maps are one obvious example.

  • Information about open data is scattered. According to Fioretti, 70%
    of public data and online services in Italy are provided by local
    governments. Decentralization increases the effort required to collect
    data about expenditures, and shows that many important decision are
    taken on a city-by-city basis rather than at a national level.

  • As Fioretti points out in a report
    from a conference
    , one can’t really anticipate how much economic
    value one will create by releasing data. However, he hopes to be able
    to quantify the value afterward, to generate an incentive to release
    more data.

  • Data that is not open is silo’d. For instance, two adjoining
    geographic regions may have map data, but no one can calculate
    geographic information spanning the two regions. And even if they
    merge their data, they often find that the combined data contains
    errors because of incompatbilities in storage. A river may stop at the
    regional border and start again on the other side a kilometer away,
    for instance. Such errors have to be fixed after the fact at great
    expense.

Releasing data that was collected for public use with public taxes is
an appealing goal, but it faces innumerable hurdles. First,
governments usually contract out both data collection and data
analysis to private firms. Right away we’re faced with the challenges
of incompatible, proprietary, and even arbitrary formats, along with
the firm’s understandable preference to keep data to itself.

So government contracts must be very specific about the delivery of
data that it commissions–and not just the data, but the formulas and
software used to calculate results. For instance, if a spreadsheet was
used in calculating the cost of a project, the government should
release the spreadsheet data and formulas to the public in an open
format so that experts can check the calculations.

On top of these barriers lie the usual difficulties of inconsistently
recorded data, missing metadata such as dates and times, etc.

Fioretti hopes to shine a bit more light through all this smoke,
finding out what data is being released right now and how businesses
are using it. He’s concentrating on local governments, first because
of their importance, and second because the data will be more
consistent that way. The structure of government projects and costs
are more similar from one city to another–even across national EU
borders–than from one national government to another.

One phase of the study will be a survey asking cities to give examples
of how the release of local data has enable new business uses. For
instance, he can take a region that used to sell digital road map
information for thousands of dollars, but recently opened it up for
free, and count the use of that data by businesses in that region
before and after it was opened.

This phase concentrates on small businesses, because large ones
usually can afford the fees charged for data that is not open.

Fioretti plans to use the results of this phase to demonstrate the
value of his study and then launch a large follow-up phase. In that one,
he’ll just ask a large number of cities a few questions about which
data they make open, with which licenses, and in which formats.

Knowing what data is available to the public does not in itself teach
us anything about the economics of open data. But once Fioretti posts
his results–in downloadable format under an open license, of
course–other researchers can correlate the results with other
information gathered about local businesses. So it may take several
years to learn something practical, but the EU should be commended for
trying to quantify the impact of what Tim O’Reilly calls government as
a platform.

Two other new research projects in open government deserve publicity:

tags: , , , , , ,
  • Jehnavi

    I would say the answer lies somewhere in between. There are clear differences between open government data and open source, but there are a slew of similarities. One of the big differences is that data doesn’t do anything, code does; this is to say, that the operational ability to modify data and “patch” it locally isn’t as important in data vs. code. But that doesn’t mean that users can’t “fork” open government data, especially if an agency, etc. is too stubborn to fix flaws in data sets themselves.
    http://www.onlinenotebook.com/