Democratizing data, and other notes from the Open Source convention

There has been enormous talk over the past few years of open data and what it can do for society, but proponents have largely come to admit: data is not democratizing in itself. This topic is hotly debated, and a nice summary of the viewpoints is available in this PDF containing articles by noted experts. At the Open Source convention last week, I thought a lot about the democratizing potential of data and how it could be realized.

Who benefits from data sets

At a high level, large businesses and other well-funded organizations have three natural advantages over the general public in the exploitation of data sets:

The resources to gather the data
The resources to do the necessary programming to crunch and interpret the data
The resources to act on the results

These advantages will probably always exist, but data can be useful to the public too. We have some tricks that can compensate for each of the large institutions’ advantages:

Crowdsourcing can create data sets that can help everybody, including the formation of new businesses. OpenStreetMap, an SaaS project based on open source software, is a superb example. Its maps have been built up through years of contributions by people trying to support their communities, and it supports interesting features missing from proprietary map projects, such as tools for laying out bike paths.
Data-crunching is where developers, like those at the Open Source convention, come in. Working at non-profits, during week-end challenges, or just on impulse, they can code up the algorithms that make sense of data sets and apps to visualize and accept interaction from people with less technical training.
Some apps, such as reports of neighborhood crime or available health facilities, can benefit individuals, but we can really drive progress by joining together in community organizations or other associations that use the data. I saw a fantastic presentation by high school students in the Boston area who demonstrated a correlation between funding for summer jobs programs and lowered homicides in the inner city–and they won more funding from the Massachusetts legislature with that presentation.

Health care track

This year was the third in which the Open Source convention offered a health care track. IT plays a growing role in health care, but a lot of the established institutions are creaking forward slowly, encountering lots of organizational and cultural barriers to making good use of computers. This year our presentations clustered around areas where innovation is most robust: personal tracking, using data behind the scenes to improve care, and international development.

Open source coders Fred Trotter and David Neary gave popular talks about running and tracking one’s achievements. Bob Evans discussed a project named PACO that he started at Google to track productivity by individuals and in groups of people who come together for mutual support, while Anne Wright and Candide Kemmler described the ambitious BodyTrack project. Jason Levitt gave the science of sitting (and how to make it better for you).

In a high-energy presentation, systems developer Shahid Shah described the cornucopia of high-quality, structured data that will be made available when devices are hooked together. “Gigabytes of data is being lost every minute from every patient hooked up to hospital monitors,” he said. DDS, HTTP, and XMPP are among the standards that will make an interconnected device mesh possible. Michael Italia described the promise of genome sequencing and the challenges it raises, including storage requirements and the social impacts of storing sensitive data about people’s propensity for disease. Mohamed ElMallah showed how it was sometimes possible to work around proprietary barriers in electronic health records and use them for research.

Representatives from OpenMRS and IntraHealth international spoke about the difficulties and successes of introducing IT into very poor areas of the world, where systems need to be powered by their own electricity generators. A maintainable project can’t be dropped in by external NGO staff, but must cultivate local experts and take a whole-systems approach. Programmers in Rwanda, for instance, have developed enough expertise by now in OpenMRS to help clinics in neighboring countries install it. Leaders of OSEHRA, which is responsible for improving the Department of Veteran Affairs’ VistA and developing a community around it, spoke to a very engaged audience about their work untangling and regularizing twenty years’ worth of code.

In general, I was pleased with the modest growth of the health care track this year–most session drew about thirty people, and several drew a lot more–and both the energy and the expertise of the people who came. Many attendees play an important role in furthering health IT.

Other thoughts

The Open Source convention reflected much of the buzz surrounding developments in computing. Full-day sessions on OpenStack and Gluster were totally filled. A focus on developing web pages came through in the popularity of talks about HTML5 and jQuery (now a platform all its own, with extensions sprouting in all directions). Perl still has a strong community. A few years ago, Ruby on Rails was the must-learn platform, and knock-off derivatives appeared in almost every other programming language imaginable. Now the Rails paradigm has been eclipsed (at least in the pursuit of learning) by Node.js, which was recently ported to Microsoft platforms, and its imitators.

No two OSCons are the same, but the conference continues to track what matters to developers and IT staff and to attract crowds every year. I enjoyed nearly all the speakers, who often pump excitement into the dryest of technical topics through their own sense of continuing wonder. This is an industry where imagination’s wildest thoughts become everyday products.

Democratizing data, and other notes from the Open Source convention

Health care track draws a small and passionate core

Who benefits from data sets

Health care track

Other thoughts

Get the O’Reilly Programming Newsletter