Governments looking for economic ROI must focus on open data with business value

There’s increasing interest in the open data economy from the research wings of consulting firms. Capgemini Consulting just published a new report on the open data economy. McKinsey’s Global Institute is following up its research on big data with an inquiry into open data and government innovation. Deloitte has been taking a long look at open data business models. Forrester says open data isn’t (just) for governments anymore and says more research is coming. If Bain & Company doesn’t update its work on “data as an asset” this year to meet inbound interest in open data from the public sector, it may well find itself in the unusual position of lagging the market for intellectual expertise.

As Radar readers know, I’ve been trying to “make dollars and sense” of the open data economy since December, looking at investments, business models and entrepreneurs.

In January, I interviewed Harvey Lewis, the research director for the analytics department of Deloitte U.K. Lewis, who holds a doctorate in hypersonic aerodynamics, has been working for nearly 20 years on projects in the public sector, defense industry and national security. Today, he’s responsible for applying an analytical eye to consumer businesses, manufacturing, banking, insurance and the public sector. Over the past year, his team has been examining the impact of open data releases on the economy of the United Kingdom. The British government’s embrace of open data makes such research timely.

Given the many constituencies interested in open data these days, from advocates for transparency and good government to organizations interested in co-creating civic services to entrepreneurs focused on building and scaling sustainable startups, one insight stood out from our discussion in particular:

“The things you do to enable transparency … aren’t necessarily the same things you do to enable economic growth and economic impact,” said Lewis.

“For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.”

The rest of our interview, lightly edited for content and clarity, follows.

Why is Deloitte interested in open data?

Harvey Lewis: In late 2011, we realized that open data was probably going to be one of those areas that was likely to be transformational, maybe not in the short term, but certainly in the long term. A lot of the technology that companies are using to do analysis of data will become increasingly commoditized, so the advantage that people were going to get was going to come through their interpretations of data and by looking for other commercial mechanisms for getting value from data.

The great thing about open data is that it provides those opportunities. It provides, in some ways, a level playing field and ways of creating revenue and opportunities that just don’t exist in other spaces.

You’ve been investigating the demand for open data from businesses. How have you approached the research?

Harvey Lewis: We’ve been working with professor Nigel Shadbolt in the U.K., who is one of the great champions on the global stage for open data. He and I started work on our open data activity back about 12 months ago.

Our interest was not so much in open government data but more the spectrum of open data, from government, business and individual citizens. We thought we would run an exercise over the spring of 2012, inviting various organizations to come and debate open data. We were very keen to get a cross-section of people from public and private sectors in those discussions because we wanted to understand what businesses thought of open data. We published a report [PDF] in June of last year, which was largely qualitative, looking at what we thought was happening in the world of open data, from a business perspective.

There were four main hypotheses to that vision:

The first part was that we thought every business should have a strategy to explore open data. If you look at the quantity of data that’s now available globally, even just from government, it’s an extraordinary amount, if you measure it just by the number of datasets that are published. In the U.K., it’s in the tens of thousands. In the U.S., it’s in the hundreds of thousands. There’s a vast resource of data that’s freely available that can be used to supplement existing sources of information, proprietary or otherwise, and enrich companies’ views of the world.

The second part was that businesses themselves would start to open up their data. There are different ways of gaining revenue and value from data if they opened it up. This was quite a controversial subject, as I’m sure you might imagine, in some of the discussions. Nevertheless, we’re starting already to see companies releasing subsets of their data on competition websites, inviting the crowd to come up with innovative solutions. We’re also seeing evidence that companies are releasing their data to improve the way they interact with their customers. I think one of the great broad impacts of businesses opening up their data is reputational enhancement — and that can have a real economic benefit.

The third part of our hypothesis was that open data would inspire customer engagement. That is, I think, a great topic for exploration within the public sector itself. Releasing this data isn’t just about “publishing it and they will come” — it’s about releasing data and using that data to engage in a different type of conversation with citizens and consumers.

Certainly in the U.K., we’re starting to see the fruits of that and some new initiatives. There’s a concept called “midata” in the U.K., where the government is encouraging service providers to release consumer data back to individuals so they can shop around for the best deals in the market. I think that’s a great vision for open data.

The fourth part was the privacy and the ethical responsibilities that come with the processing of open data, with companies and government starting to work more closely together to come up with a new paradigm for responsibility and privacy.

Nigel Shadbolt and I committed to doing further work on the economic business case for open data to try to address some of these hypothetical views of the future.

That launched this second phase of our work, which was trying to quantify that economic benefit. We decided very early on, because of Nigel Shadbolt’s relationship to the Open Data Institute, to work closely with that organization, as it was born in the summer of 2012.

We spent a lot of time gathering data. Particularly, we were looking at whether or not we could infer from the demand for open data from a variety of government portals what the economic benefit would be. We looked to a number of other measures and data sources, including a very broad balance sheet analysis to try to infer how companies were increasingly using data to run their businesses and benefit their businesses.

What did you find in this inquiry?

Harvey Lewis: We published a second report, called “Open Growth,” in early December of last year. The fundamental problem in trying to estimate the economic benefit is around, essentially, a lack of data. It sounds quite ironic, doesn’t it, that there’s a lack of data to quantify the effect of open data?

In particular, it’s still early days for determining economic benefit. When you’re trying to uncover second-order effects in the economy due to open data, it’s very early days to be able to see those effects percolate through different sectors. We were really challenged. Nevertheless, we were able to look quite closely at the sorts of data that the U.K. government had been publishing and draw some conclusions about what that meant for the economy.

For example, we were able to categorize nearly 40,000 datasets that are publicly available from the U.K. government and other public bodies in the U.K. into a number of discreet categories. Thirty-three percent of the data that was being published by the government was related to government expenditure. A large slice of the data that was being supplied had to do with the economy demographics and health.

Does more transparency lead to positive economic outcomes?

Harvey Lewis: In the U.K., and certainly to some extent in the U.S., there are multiple objectives at work in open data.

One of the primary objectives is transparency, publishing data that allows citizens to really kick the tires on public services, hopefully leading them to be improved, to increase quality and choice for individual citizens.

The things you do to enable transparency, however, aren’t necessarily the same things you do to enable economic growth and economic impact. For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.

Put some guarantees around those data sources to preserve their formats, longevity and utility, so that businesses have the confidence to use them and start building companies on the backs of them. Investors have got to have confidence that data will be available in the long term.

Those are the steps you take for economic growth. They’re quite different from the steps you might take for transparency, which is about making sure that all data that has a potential bearing on public services and cities and interpretation of government policy is made available.

You defined five business model archetypes in your report: “suppliers, aggregators, developers, enrichers and enablers.” Which examples have been sustainable?

Harvey Lewis: In coming up with that list, we did an analysis of as many companies as we could find. We tried to apprize business models from publicly available information to get a better understanding of what they were doing with the data and how they were generating revenue from it.

We had a long list of about 15 or 16 discreet business models that we were then able to cluster into these five archetypes.

Suppliers are publishing open data, including, of course, public sector bodies. Some businesses are publishing their data. While there may be no direct financial return if they publish data as open data and make it freely available, there are nevertheless other benefits that are going to become very meaningful in the future.

It’s something that a lot of businesses won’t be able to ignore, particularly when it comes to sustainability and financial data. Consumers are putting a lot of businesses under a great deal of scrutiny now to make sure that businesses are operating with integrity and can be trusted. A lot of this is about public good or customer good, and that can be quite intangible.

The second area, aggregators, is perhaps the largest. Organizations are pooling publicly available data, combining it and producing insights from it that are useful. They’re starting to sell those insights to businesses. One example in the report takes open data from the public body that all companies that are operating in the U.K. have to register with. They combine that data with other sources from the web, social media and elsewhere to produce intelligence that other businesses can use. They’re growing at quite a phenomenal rate.

We’re seeing a decline of organizations that are purely aggregating public sources of information. I don’t think there’s a sustainable business model there. Particular areas, like business intelligence, energy and utilities, are taking public data and are getting insights. It’s the insights that have monetary value, not the data itself.

The third are the classic app developers. This is of greatest interest where the data that is provided by the public sector is granular, real-time, updated frequently and close to the hearts of ordinary citizens. Transport data, crime data, and health data are probably the three types of data where software developed on the back of that data is going to have the greatest impact.

In the U.K., we’re seeing a lot of transport applications that enable people to plan journeys across what is, in some cases, quite a fragmented transport infrastructure — and get real benefits as a result. I think it’s only a matter of time before we start to see health data being turned into applications in exactly the same way, allowing individuals to make more informed choices, understand their own health and how to improve it and so on.

The fourth area, enrichers, is a very interesting one. We think this is the “dark matter” of the open data economy. These are larger, typically established businesses that are hoovering significant quantities of open data and combining it with their own proprietary sources to offer services to customers. These sorts of services have traditionally existed and aren’t going to go away if the open data supplies dry up. They are hugely powerful. I’m thinking of insurers and retailers who have a lot of their own data about customers and are seeking better models of risk and understanding of customers. I think it’s difficult to measure economic benefit coming from this particular archetype.

The last area is enablers. These are organizations that don’t make money from open data directly but provide platforms and technologies that other businesses and individuals use. Competition websites are a very good example, where they provide a facility that allows businesses, public sector institutions, or research institutions to make subsets of their data available to seek solutions from the crowd.

Those are the five principal archetypes. The one that stands out, underpinning the open data market at the moment, is the “enricher” model. I think the hope is that the startups and small-to-medium enterprises in the aggregation and the developer areas are going to be the new engine for growth in open data.

Do you see adjustments being made based upon demand? Or are U.K. data releases conditioned upon what the government finds easy or politically low-risk?

Harvey Lewis: This comes back to my point about multiple objectives. The government in the U.K. is addressing a set of objectives through its open data initiative, one of which is economic growth. I’m sure it’s the same as in other countries around the world.

If the question is whether the government is releasing the right data to meet a transparency objective, then the answer is “yes.” Is it releasing the right data from an economic growth perspective? The answer is “almost.” It’s certainly doing an increasingly better job at that.

This is where the Open Data Institute really comes to the fore, because their remit, as far as the government is concerned, is to stimulate demand. They’re able to go back to the government and say, “Look, the real opportunity here is in the wholesale and retail sector. Or in the real estate sector — there are large swaths of government data that are valuable and relevant to this sector that are underutilized.” That’s an opportunity for the government to engage with businesses in those sectors, to encourage the use of open data and to demonstrate the benefits and outcomes that they can achieve.

It’s a very good question, but it depends on which objective you’re thinking about as to whether or not the answer is the right one. I think if you look toward the Danish government, for example, and the way that they’re approaching open data, there’s been a priority on economic growth. The sorts of datasets they’re releasing are going to stimulate growth in the Danish market, but they may not satisfy fully the requirements that one might expect from a transparency perspective or social growth perspective.

Does data format or method of release matter for outcomes, to the extent that you could measure it?

Harvey Lewis: From our analysis, data released through APIs and, in particular, transport data was in significant demand. There were noticeably more applications being built on the back of transport data published through an API than in almost any other area.

As a mechanism for making it easy for businesses to get hold of data, APIs are pretty crucial. Being able to provide data using that mechanism is a very good way of stimulating use.

Based on some of the other work that we’ve been doing, there’s a big push to release data in its raw form. CSV is talked about quite a lot. In some cases, that works well. In other cases, it is a barrier to entry for small-to-medium enterprises.

To go back to the general practitioner prescribing data, a single month’s worth of data is published in a CSV file each month. The file size is about half a gigabyte and contains typically over four million records. If you’re a small-to-medium enterprise with limited resources — or even if you’re a journalist — you cannot open that data file in typical desktop or laptop software. There’s just too many records. Even if you can find software that will open it, running queries on it takes a very long time.

There’s a natural barrier to entry for some formats that you really only appreciate once you try to process and get to grips with the data. That, I think, is something that needs to be thought through.

There’s an imperative to get data out there, but if you provide that data in a format that small-to-medium enterprises can’t use, I think it’s unfair. Larger businesses have the tools and the specialist capability to look at these files. That creates a problem, an economic barrier. It also creates a transparency barrier because although you may be publishing the data, no one can access it. Then you don’t get the benefits of increased transparency and accountability.

Where you’ve got potentially high-value datasets in health, crime, spending data and energy and environment data, a lot of care needs to be put into what formats are going to make that most easily accessible.

It isn’t always obvious. It isn’t the CSV file. It certainly isn’t the PDF! It isn’t anything, actually, that requires specialist knowledge and tools.

What are the next steps for your research inquiry?

Harvey Lewis: We’re continuing our work, trying to formulate ideas and methods. That includes using case studies and use cases, getting information from the public sector about how much it costs to generate the data, and looking at accounts of actual scenarios.

Understanding the economic impact, despite its challenges, is really important to policymakers around open data, to ensure that the benefits of releasing open data outweigh the costs of producing it. That’s absolutely essential to the business case of open data.

The other part of our activity is focusing on the insights that can be derived from open data that benefit the public sector or private sector companies. We’re looking quite hard at the growth opportunities in open data and the areas where significant cost savings or efficiencies can be gained.

We’re also looking at some interesting potential policy areas by mashing up different sources of data. For example, can you go some way to understanding the relationship between crime and mental health? With the release of detailed crime data and detailed prescribing data, there’s an opportunity, at a very granular level, to understand potential correlations and then do some research into the underlying causes. The focus of our research is subtly shifting toward more use-case type analysis, rather than looking at an abstract, generic picture about open data.

Bottom line: does releasing open data lead to significant economic benefit?

Harvey Lewis: My instinct and the data we have today suggest that it is going to lead to significant economic benefit. Precisely how big that benefit is needs further study.

I think it’s likely to be more in the realm of the broader impacts and some of the intangibles where we see the greatest impact, necessarily through new businesses starting up and more businesses using open data. We will see those things.

This post is part of our ongoing investigation into the open data economy.