The City of Chicago wants you to fork its data on GitHub

GitHub has been gaining new prominence as the use of open source software in government grows.

Earlier this month, I included a few thoughts from Chicago’s chief information officer, Brett Goldstein, about the city’s use of GitHub, in a piece exploring GitHub’s role in government.

While Goldstein says that Chicago’s open data portal will remain the primary means through which Chicago releases public sector data, publishing open data on GitHub is an experiment that will be interesting to watch, in terms of whether it affects reuse or collaboration around it.

In a followup email, Goldstein, who also serves as Chicago’s chief data officer, shared more about why the city is on GitHub and what they’re learning. Our discussion follows.

The City of Chicago is on GitHub.

What has your experience on GitHub been like to date?

Brett Goldstein: It has been a positive experience so far. Our local developer community is very excited by the MIT License on these datasets, and we have received positive reactions from outside of Chicago as well.

This is a new experiment for us, so we are learning along with the community. For instance, GitHub was not built to be a data portal, so it was difficult to upload our buildings dataset, which was over 2GB. We are rethinking how to deploy that data more efficiently.

Why use GitHub, as opposed to some other data repository?

Brett Goldstein: GitHub provides the ability to download, fork, make pull requests, and merge changes back to the original data. This is a new experiment, where we can see if it’s possible to crowdsource better data. GitHub provides the necessary functionality. We already had a presence on GitHub, so it was a natural extension to that as a complement to our existing data portal.

Why does it make sense for the city to use or publish open source code?

Brett Goldstein: Three reasons. First, it solves issues with incorporating data in open source and proprietary projects. The city’s data is available to be used publicly, and this step removes any remaining licensing barriers. These datasets were targeted because they are incredibly useful in the daily life of residents and visitors to Chicago. They are the most likely to be used in outside projects. We hope this data can be incorporated into existing projects. We also hope that developers will feel more comfortable developing applications or services based on an open source license.

Second, it fits within the city’s ethos and vision for data. These datasets are items that are visible in daily life — transportation and buildings. It is not proprietary data and should be open, editable, and usable by the public.

Third, we engage in projects like this because they ultimately benefit the people of Chicago. Not only do our residents get better apps when we do what we can to support a more creative and vibrant developer community, they also will get a smarter and more nimble government using tools that are created by sharing data.

We open source many of our projects because we feel the methodology and data will benefit other municipalities.

Is anyone pulling it or collaborating with you? Have you used that code? Would you, if it happened?

Brett Goldstein: We collaborated with Ian Dees, who is a significant contributor to OpenStreetMaps, to launch this idea. We anticipate that buildings data will be integrated in OpenStreetMaps now that it’s available with a compatible license.

We have had 21 forks and a handful of pull requests fixing some issues in our README. We have not had a pull request fixing the actual data.

We do intend to merge requests to fix the data and are working on our internal process to review, reject, and merge requests. This is an exciting experiment for us, really at the forefront of what governments are doing, and we are learning along with the community as well.

Is anyone using the open data that wasn’t before, now that it’s JSON?

Brett Goldstein: We seem to be reaching a new audience with posting data on GitHub, working in tandem with our heavily trafficked data portal. A core goal of this administration is to make data open and available. We have one of the most ambitious open data programs in the country. Our portal has over 400 datasets that are machine readable, downloadable and searchable. Since it’s hosted on Socrata, basic analysis of the data is possible as well.