Earlier this week I talked to writer and open source advocate
Marco
Fioretti, who has just announced the start of a
study on open data for the European Union. Fioretti is a
long-time supporter of open source software, which he wrote about in a
chapter of the O’Reilly book Open Government.
Fioretti also held a seminar about open and prorietary formats at
Pisa’s Sant’Anna School of Advanced Studies, a major European college
in the field of economics.
Several problems impelled Fioretti to propose this study:
-
Government claims are hard to verify. When the cost of the huge Strait
of Messina Bridge project is announced, for instance, how can the
public determine whether it’s reasonable? (And why, I might add, do
most projects experience cost overruns but none ever come in under
budget?) -
Lots of value is hidden away in government data. When data is released
without fees or restrictions on use, businesses tend to spring up to
exploit that data. Maps are one obvious example. -
Information about open data is scattered. According to Fioretti, 70%
of public data and online services in Italy are provided by local
governments. Decentralization increases the effort required to collect
data about expenditures, and shows that many important decision are
taken on a city-by-city basis rather than at a national level. -
As Fioretti points out in a report
from a conference, one can’t really anticipate how much economic
value one will create by releasing data. However, he hopes to be able
to quantify the value afterward, to generate an incentive to release
more data. -
Data that is not open is silo’d. For instance, two adjoining
geographic regions may have map data, but no one can calculate
geographic information spanning the two regions. And even if they
merge their data, they often find that the combined data contains
errors because of incompatbilities in storage. A river may stop at the
regional border and start again on the other side a kilometer away,
for instance. Such errors have to be fixed after the fact at great
expense.
Releasing data that was collected for public use with public taxes is
an appealing goal, but it faces innumerable hurdles. First,
governments usually contract out both data collection and data
analysis to private firms. Right away we’re faced with the challenges
of incompatible, proprietary, and even arbitrary formats, along with
the firm’s understandable preference to keep data to itself.
So government contracts must be very specific about the delivery of
data that it commissions–and not just the data, but the formulas and
software used to calculate results. For instance, if a spreadsheet was
used in calculating the cost of a project, the government should
release the spreadsheet data and formulas to the public in an open
format so that experts can check the calculations.
On top of these barriers lie the usual difficulties of inconsistently
recorded data, missing metadata such as dates and times, etc.
Fioretti hopes to shine a bit more light through all this smoke,
finding out what data is being released right now and how businesses
are using it. He’s concentrating on local governments, first because
of their importance, and second because the data will be more
consistent that way. The structure of government projects and costs
are more similar from one city to another–even across national EU
borders–than from one national government to another.
One phase of the study will be a survey asking cities to give examples
of how the release of local data has enable new business uses. For
instance, he can take a region that used to sell digital road map
information for thousands of dollars, but recently opened it up for
free, and count the use of that data by businesses in that region
before and after it was opened.
This phase concentrates on small businesses, because large ones
usually can afford the fees charged for data that is not open.
Fioretti plans to use the results of this phase to demonstrate the
value of his study and then launch a large follow-up phase. In that one,
he’ll just ask a large number of cities a few questions about which
data they make open, with which licenses, and in which formats.
Knowing what data is available to the public does not in itself teach
us anything about the economics of open data. But once Fioretti posts
his results–in downloadable format under an open license, of
course–other researchers can correlate the results with other
information gathered about local businesses. So it may take several
years to learn something practical, but the EU should be commended for
trying to quantify the impact of what Tim O’Reilly calls government as
a platform.
Two other new research projects in open government deserve publicity:
-
Open Source for America
has begun a study to measure openness
at a number of U.S. federal government agencies. O’Reilly Media was a
founding member of OSFA and I volunteer for them. This survey is being
carried out in close cooperation with all the major federal agencies.First, OSFA is collecting public comment on the measures used. You can
vote for the traits you consider
important from now through June 15. OSFA will then send the most
relevant questions to the federal agencies. Results depend, of course,
on whether the agencies have collected the relevant information and
how candid they are in reporting trends. But the questions will be in
line with the administration’s December 2009
Open
Government Directive. -
The Association of Health
Care Journalists is asking journalists who work in health care to
report their attempts to contact the Department of Health and Human
Services, and how these contacts transpired.