Choosing the right license for open data

Why OpenStreetMap is moving from Creative Commons to the Open Database License.

OpenStreetMapYou can’t copyright a fact. But that doesn’t mean that data and databases are exempt from legal discussions and licensing requirements, even if the intention is to share the data openly. Such is the case with the collaborative mapping project OpenStreetMap (OSM).

When OpenStreetMap launched, contributions to the project were licensed under the Creative Commons Attribution/ShareAlike license. That meant that anyone could copy OSM data, but if it was incorporated into another project, those same terms and conditions applied (ShareAlike) and the copyright owner had to be credited (Attribution). Although this license doesn’t appear controversial and seems to fit nicely with the OpenStreetMap mission, there were problems — if for no other reason than the Creative Commons licenses are meant to handle creative works and not data.

After much discussion with lawyers and with the community, OpenStreetMap opted to make the move to the Open Database License (ODbL), arguing it was more suited to OSM’s purposes. I recently asked OSM founder Steve Coast about the decision and the process of making the switch.

What compelled OpenStreetMap to change to the Open Database License?

Steve Coast: Licensing is incredibly important for the community to trust that the data won’t be closed off. So we need to make sure that data from OpenStreetMap will always be free and open. It’s also important that we are able to stop anyone from trying to close it off or derive from it without giving back to the community. We have a multi-year process to re-license based on advice from multiple sources that Creative Commons is not applicable to data. We wish it were, and it probably will be in the future but it wasn’t clear when we began. Until that happens we have a process to move to the Open Database License, which explicitly covers data and not just creative works like photographs or text. The ODbL was in fact started as a result of investigations around the needs of Science Commons and we just helped it to its conclusion.

At some point down the line I personally expect the ODbL and CC to be compatible and we will be able to cross-pollinate once more.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

What are some of the arguments for and against the re-licensing?

Steve Coast: The arguments for are pretty clear. Creative Commons, as they themselves have asserted, doesn’t cover the rights around data. We seek that protection.

The arguments against range from staying the course to believing that CC does in fact cover data. We have some people who feel CC has worked thus far, so why change? We have some that feel CC does cover data. These are legitimate arguments to be discussed, as we have many times. We also have a small amount of data derived from sources (like aerial imagery) that either refuse to change or are unresponsive. With them we have to work closely to find a way through.

What are some of the challenges of the move — in terms of technology, legality and the community?

Steve Coast: Probably the most annoying problem we have is that we cannot release legal advice. So our lawyers helping us give us clear directions for the most part, but we can’t just take emails from them and forward them to the community openly. The reason for that is we lose our legal rights, including privileged communications between us and our advisers. Therefore we walk a fine line between being as open as we can and at the same time acting on the best advice we can get. That can be frustrating because people in the community can feel alienated and that we might be hiding something bad.

Technically we will get to the point where we will have to remove some data from the project simply because some people will be unreachable if nothing else. It will be fairly insignificant given the corpus of OSM and the rate at which it grows but still, we’d prefer to not remove anything.

For the most part the community has been fantastic around this, especially the core people going through the process week after week for multiple years. We have our loud minority like any open community and we try to be as accommodating as we can. I would describe it as a vast evangelizing exercise. Explaining what we’re doing and why to the first 10 people takes about six month. The next 100 takes about another six months — as those initial 10 talk to another 10. Then the next 1,000 take another six months. The next 10,000 take six months. Then the next 100,000 the same again. So it feels to the initial 10 or so like a very long slog explaining everything many, many times. But we have to be mindful that people come along all the time who have no idea why we’re changing or what the plan looks like. So we need some very diplomatic people helping the process along. And of course all of those people looking at the license and the process expose bugs in it, which we have then worked out over time.

Having worked through this move for several years now, what are some of the lessons learned?

Steve Coast: I chose CC-BY-SA for the right community and open reasons but I was not and am not a lawyer. You have to choose a license up front so people know that their contributions will not be closed off. On the other hand if the data had been dual licensed to the OSM Foundation or me personally then we could have switched the license much more quickly. The lesson, I think, is to have multiple options. You don’t know which direction things will go in three, four or five years down the line. For all the time and pain this change is costing us, it’s extremely healthy and a maturing thing to do. It’s been a forcing function on the structure of the foundation that supports the project and has made us build working groups, and have a functioning board and finance structures. On average it’s probably been a good thing.

The other lesson is to just be insanely open, and realize there will always been conspiracy theorists whom you will never convince. All our meetings were and are open. We have open minutes. But there are some who will refuse to use a telephone or want you to use technology X to hold your meeting or have it at time Y. You simply can’t satisfy everyone. You have to make some choices while trying your best to accommodate demands. Don’t get pulled down the rathole of trying to make everyone happy all of the time. You can shift the meeting once a month, or try the occasional new thing, but you have to make progress no matter what.

Looking back, I wish that I’d traveled more to spread the message and talk to more people in person. But that’s extremely hard and expensive to do.

Hopefully other projects can start based on the years we have put in to this and license with the ODbL or perhaps dual-license with CC-BY-SA. With luck they will never have to know how much work it took to get here.

This interview was condensed and edited

Related:

tags: ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • g

    > It will be fairly insignificant given the corpus of OSM and the rate at which it grows but still, we’d prefer to not remove anything.

    Yeah, 50% of Australia’s data is pretty insignificant (source: http://odbl.de/). Not.

  • h

    Actually, 50% of Australia’s data IS pretty insignificant when you start looking at the numbers involved (and actually the ABS2006 CC-BY import data is nearer 15% rather than 50% of that data). For whatever reason, OSM is still largely a northern-hemisphere project.

    That’s not to say that Australian mappers don’t do an excellent job – they do; I’ve been there and used maps created with their data. The problem is that there just aren’t enough of them, and using imports to fill the gaps is problematical, and not just for licensing reasons.