Knight winners are putting data to work

The common thread among the Knight Foundation's latest grants: practical application of open data.

Data, on its own, locked up or muddled with errors, does little good. Cleaned up, structured, analyzed and layered into stories, data can enhance our understanding of the most basic questions about our world, helping journalists to explain who, what, where, how and why changes are happening.

Last week, the Knight Foundation announced the winners of its first news challenge on data. These projects are each excellent examples of working on stuff that matters: they’re collective investments in our digital civic infrastructure. In the 20th century, civil society and media published the first websites. In the 21st century, civil society is creating, cleaning and publishing open data.

The grants not only support open data but validate its place in the media ecosystem of 2012. The Knight Foundation is funding data science, accelerating innovation in the journalism and media space to help inform and engage communities, a project that they consider “vital to democracy.”

Why? Consider the projects. Safecast creates networked accountability using sensors, citizen science and open source hardware. LocalData is a mobile method for communities to collect information about themselves and make sense of it. Open Elections will create a free, standardized database stream of election results. Development Seed will develop better tools to contribute to and use OpenStreetMap, the “Wikipedia of maps.” Pop Up Archive will develop an easier way to publish and archive multimedia data to the Internet. And Census.IRE.org will improve the ability of a connected nation and its data editors to access and use the work of U.S. Census Bureau.

The projects hint at a future of digital open government, journalism and society founded upon the principles that built the Internet and World Wide Web and strengthened by peer networks between data journalists and civil society. A river of open data flows through them all. The elements and code in them — small pieces, loosely joined by APIs, feeds and the social web — will extend the plumbing of digital democracy in the 21st century.

You can read interviews with the winners in the linked sections below. Each was lightly edited for content and clarity.


Why did the Knight Foundation fund these projects

Chris Sopher, the Knight Foundation’s journalism program associate, offered more context for their grant choices.

What made these winners stand out to you? Why has data taken on the importance it has in media today?

Chris Sopher: We received and read a lot of compelling applications using data in interesting ways. Ultimately, we chose these six projects because they do more than collect or open data: they make data useful, and aim to solve demonstrated needs people are experiencing.

For example, Safecast will eventually be able to give people air quality information that’s actionable, down to a neighborhood or even immediate level, rather than a regional average that’s not of much use to people with health concerns or people who want to identify sources of pollution. Serdar Tumgoren and Derek Willis got started on OpenElections because they noticed lots of reporters needing historical election information, which seems like an obvious resource for political reporting, and not being able to find it. All these projects are addressing real needs through data.

Data writ large is important to the media ecosystem, and to our goals at Knight Foundation. There’s now so much of it available but we’re still so bad at curating and making it useful. There’s a lot to be learned from using and understanding data (of all sizes, too — not just “big data”) as a core part of journalism or any information project. Unfortunately, our tools and practices haven’t caught up to the availability of that data. That’s part of why we chose it as a theme for this round of the News Challenge.

How does data play a role in the business models that will sustain news gathering and accountability for government at the local, state and federal level?

Chris Sopher: Governments, and other organizations journalists cover, collect an incredible amount of data of all sorts. It’s not always easy to access, and there’s a long way to go, but open government and open data have made fairly remarkable strides in the last few years. There’s considerably more information — and increasingly there are much better tools — available to help journalists and others understand what’s happening at any level of government. [They can] help contextualize news coverage of a particular event or help a news organization understand its community and how best to serve it.

A basic example of this is the Texas Tribune’s public employee salary database, which, despite its simplicity, is persistently one of the most popular things on the site. A more complex example is Umbel, which uses data to create insight profiles for an audience and help you understand more about what your community likes, how to serve them better, and how to generate more revenue from them.

What could this collection of people and technology change in society?

Chris Sopher: All of these projects are using data to enable action and help others improve their work. Census.IRE.org will help journalists report better on community and national trends. LocalData will help groups that are already interested in data to collect and analyze it in more productive ways. Development Seed is building tools that help OpenStreetMap take advantage of the tremendous attention and growth it’s received recently. We see all of these as part of an infrastructure that helps journalists better understand and serve their communities through data.

How will the work that these organizations do filter down to inform or empower, say, retirees or college students?

Chris Sopher: This varies by the project. LocalData, for example, lets anyone collect data about their community, whether that’s for a class project or community organizing or journalism. The design is aimed at making it easy for people like your niece to set up and use, either on paper or on a smartphone. There’s also an infrastructure layer to each of these, creating tools and platforms that others will build on for the end user.

Safecast’s air quality project will collect data that anyone can access, but will perhaps more importantly allow for others to build applications on top of that data that serve particular communities, such as people with health concerns. All of these projects proposed, and are receiving funding for, a phase in which they’ll focus on building these use communities.

How will these organizations play a role in the future of open government and the current open data movement?

Chris Sopher: The first part of this answer is to say that we’re very interested in this question, and it’s part of the reason we chose open government as the first News Challenge topic of 2013.

As for this challenge, these projects are of the open data movement. OpenStreetMap has been one of the open data movement’s most interesting recent success stories, in its ability to get adoption by major players like Foursquare, with only a tiny formalized organization supporting it.

LocalData comes from a team of Code for America fellows who, during their year in Detroit, noticed that lots of community groups and government agencies were gathering data about topics like urban blight but didn’t have good tools for designing and organizing that process. They found that the biggest obstacle to open data at the local level wasn’t resistance to the concept but rather the barrier to entry created by complicated tools.

We see this group of projects as building the field for open data. We will focus more specifically on how those concepts apply to government in the 2013 challenge.

Strata Conference + Hadoop World— The O’Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World.Save 20% on registration with the code RADAR20

Pop Up Archive

Founders Anne Wooton and Bailey Smith offered more context on Pop Up Archive and what they plan to do with it.

Where did your idea come from, originally? How long did it take to build? What tech did you use?

Anne Wooton and Bailey Smith: In the fall of 2010, Anne met Davia Nelson, one half of the Kitchen Sisters. The Kitchen Sisters are local independent producers who have been working in public radio for 30 years. Over that time, they’ve compiled a collection of thousands of hours of archival sound and ancillary media. Nikki Silva, the other half of The Kitchen Sisters, said that having an online archive would make it possible to do in five minutes what they consider impossible today.

Why impossible? Producers of public media, you might not be surprised to hear, lack resources. They lack archival and technical training. And they lack an understanding about how best to organize and provide access to their content. Most producers are loathe to commit to any new system because they’re wary of a rapidly changing technology landscape. Why invest now if things will change in five years? But most importantly, archiving is overwhelming. They’ll have to make decisions about the scale of archives, legacy data, and structuring media for the web, when they would rather dedicate limited time and resources to the production of new content.

So, we set out to fix this problem. We spent a year developing an alpha version of a simple organizational system for archival audio content as our master’s thesis at UC Berkeley’s School of Information.

To begin, we adopted and adapted an open-source archival publishing platform, Omeka, because it’s a great piece of software with low barriers to entry. We built three plug-ins for Omeka: the first creates a metadata schema for public media based on the PBCore standard, the second uploads content for preservation at the Internet Archive, and the third enables one-click sharing of content through SoundCloud. However, as the project grows we’ll develop solutions that work with other content management systems being used in public media, such as Drupal.

What problem does Pop Up Archive solve that other methods for publishing data do not? If similar methods exist, what does it do better?

Anne Wooton and Bailey Smith: No preexisting solution exists for collections that have become accidental archives. Producers must choose between expensive commercial solutions and complex open-source tools that require substantial development to meet their needs.

Even if produced pieces are distributed in a variety of ways — whether through PRX, NPR and its affiliate stations, or web services like SoundCloud and Stitcher — it’s a time-consuming and redundant process. There is no standard way to organize and retrieve broadcast content. Every producer and every station is currently devising their own way to manage the madness. One of our interviewees explained that his organization has many separate workflows composed of reporters, producers, fact checkers and editors. Each of these parties has their own process of file management, metadata creation and storage. Most often files are shared by email or Dropbox, which strips the file of critical metadata and context before it’s ushered along in the process of production.

And that doesn’t even speak to the thousands of hours of archival audio that are wasting away on decaying hard drives in closets across America. Producers are at a loss when it comes to finding, accessing and sharing archival content.

What data problem does this fundamentally address?

Anne Wooton and Bailey Smith: Web archiving is changing. Content creation has become easier in the last decade, and archiving needs to catch up. Open source archive solutions exist, but they aren’t usually easy to implement or maintain. So many content creators just choose not to archive. The result is countless “orphaned” collections. And without care these valuable cultural artifacts eventually degrade, get thrown away, and just generally disappear. So it’s imperative not to wait until a collection is endangered to begin archiving. Archiving has to become simpler.

Metadata is information about content (what’s it called, who created it, where was it was created, etc.). It makes searching media possible. Right now broadcast metadata is inconsistent and incomplete. And this is because, up until now, most producers and broadcasters only needed to share their metadata within their own organization.

That’s not true anymore. Producers need and want to share their content, and in order to do that successfully, they need a standardized metadata schema that makes it possible to share content with outside organizations and systems.

Who will be the users. Who will be the audience?

Anne Wooton and Bailey Smith: Producers and public media organizations will be our primary users and audience, as they seek both to organize their own content and exchange data in the creation of new work. We see both enterprise and consumer-level potential for the system: anyone with an audio or video collection who would like to access and share content using web technologies. Our audience will potentially include oral history scholars as they search for content in the course of their work. By creating a process that anyone can follow — regardless of technological or archival training — we are meeting an urgent need that we heard voiced time and again in our interviews with producers, listeners, and public media organizations.


OpenStreetMap and Development Seed

Eric Gundersen, the president and co-founder of Development Seed, explained more about his work in the interview below. For more context on why this matters, read Nick Judd on open mapping tools in techPresident and Carl Franzen on what this investment could mean for an open source digital map revolution.

Why does this matter?

Eric Gundersen: There are two ways the geo data space is going to evolve: 1) in closed silos of proprietary owned data or 2) in the open. Our community does not need a fleet of cars driving millions of miles. We need good infrastructure to make it easy for people to map their surroundings and good community tools to help us garden the data and improve quality. As geo data becomes core to mobile, maps are a canvas to visualizing the “where.”

OpenStreetMap (OSM) was famously used by relief workers in Haiti after the 2010 earthquake. What’s changed about OSM since?

Eric Gundersen: That is such a great example of how the community can come together to work where a market does not get motivated. I think the real win of that story was the fact that first responders could put the data onto their devices so quickly. You can’t do that when the data is closed. You can’t do anything when the data is closed.

In 2011, OpenStreetMap’s community continued to grow, as have its more mainstream users (read: Foursquare). Would Foursquare push to improve Google’s map? Open, at the end of the day, is going to create the right incentives for people to add data. There is only one open player: OSM.

Does the ongoing kerfuffle over iOS 6 maps highlight potential issues ahead for OSM data quality?

Eric Gundersen: Quality issues show the need to have a feedback loop from users like both OSM and Google do. The bigger question is simple. Apple is going to have to spend $1 billion? That is what this is going to take. Or they need to think about the issue differently, and start investing in common data spaces, with other players, like OSM.

What is OSM enabling people to do that they couldn’t do with ESRI or Google Mapmaker or other tools?

Gundersen: OSM’s data is open. No other mapping players can say that. This is not just about a pretty map. This is about laying the infrastructure for one of the most exciting data communities to scale. You will be incentivized to add data because no one can ever take it away from you. You can do whatever you want with it, making totally unique apps and maps. This is all about community and open data — and a large community curating open data is going to make the best map in the world.


Census.IRE.org

In the interview below, John Keefe, Joe Germuska and Ryan Pitts explain more about their project. Radar readers will recognize Keefe from a profile of his work at WNYC in our series on data journalism earlier this year.

What does “expanding and simplifying” this tool mean?

John Keefe: There’s a lot that’s possible, and we’ll need to spend some time figuring out exactly what to tackle. One of the things that comes up a lot in the WNYC newsroom is the need for American Community Survey (ACS) data, so that’s high on the list. That’s not in the current tool, which has only decennial census data. ACS has valuable social and economic information, and it’s also trickier for journalists to use because it’s an estimate, with margins of error, instead of a count. So having an easy way to view and use that info would be great.

Joe Germuska: John’s got it. In short: add the ACS, which has the juicy data that reporters are actually asking for when they ask me for census data. Improve the UX, so that it’s more straightforward to find the place you want to know about and easier to apprehend the data about that place once you find it. On that front, rather than simply throwing a bunch of data tables on the page, I’d like to work out the top N figures that people generally want and make charts that present those figures in context. Most simply, that would be as parts of a set (like a simple age distribution chart instead of this) but also possibly in relation to other places, or maybe even to historic data, although comparison over time becomes complicated quickly.

Ryan Pitts: Yes, yes, yes for ACS data — it’s really the stuff that we want to dig into in our newsroom as well. The current census.ire.org site is such a useful fact-finding and verification tool, it will be exciting to give journalists more to explore there. To piggyback on Joe’s comment about providing some contextual charts, I would also love to see us make those easy for people to embed on their own sites. A lot of reporters and editors are comfortable copy/pasting, say, YouTube code into their content management systems, and it would be excellent to make it just as easy to support a story with a custom Census data chart. Things like this already exist to some extent, from the Census site itself for example, but to my knowledge it’s not as customizable as the data at census.IRE.org allows. And there’s real value in allowing a reporter to do his or her research and then grab a supporting visualization all in the same place.

Who are the primary consumers for this? Who’s the audience?

John Keefe: Anyone in a newsroom. And, since it’s open to the public, anyone who’s curious about demographics by state, county or place. One of the reasons I love the existing tool is that anyone in a newsroom can use it. You don’t have to be experienced in navigating census data to answer simple questions — like how many children are in Brooklyn?

Joe Germuska: Journalists, plain and simple. Being able to focus on what is useful to a journalist will allow us to make editorial choices that the Census Bureau wasn’t at liberty to make when the FactFinder was built, like omitting the more than 60 summary levels that are specific to Native American reservations. That said, I am very happy with the fact that serving journalists well will also mean we probably serve any interested generalist fairly well, including students or community leaders.

What’s been the use of Census.IRE.org to date?

John Keefe: At WNYC, we’ve used it for easily dozens of stories. Sometimes, it’s just to answer some simple demographic questions. Other times, we use it for maps like a protected district, or on gay couples, the Latino makeup of NYC, or kids or seniors in NYC.

Joe Germuska: Based on anecdotal information from sources like the NICAR-L mailing list and conversations at conferences, a number of journalists found it helpful in reporting on the decennial census data as it was rolling out.

Why does census data like this matter to the American people and our understanding of ourselves and how we’re changing?

John Keefe: Census data has a lot of impact on everything from redistricting to distribution of government dollars. It can put real numbers to how our cities and neighborhoods are changing — and that the Census Bureau makes it all public is great. But the tools currently available to navigate that data — and even ask some basic questions — can be daunting.

Ryan Pitts: One of the things I’m a little bit obsessed with at the moment is finding ways to help people understand their community at a deeper level than “everyone knows it sucks to live in neighborhood X.” I mean, we all can identify the good and bad places to live in our communities. We all think we have a sense of why they’re good or bad — income levels, crime rates, and so on. But all these factors are so interrelated. Lower income levels limit educational attainment, but then lower education levels cycle back to limited income. Access to transportation can affect both those things. And all three affect access to health care, and poor health has its own set of effects … it’s such a complicated picture.

I’m really hopeful that by making data about these facets of our communities more accessible to journalists, we’ll make it easier for them to report stories that help readers unpack the complexity. Narrative along with this kind of data is a really powerful combination. I think it’s the kind of thing a community needs before it can get at the really important question: “So what do we do about this?”

Where has the value of the U.S. Census data been most clear?

John Keefe: For me, it’s in visualizing the makeup of our city and answering the basic questions that come along every day. How many kids are in public school? Where and how have the ethnic neighborhoods in NYC changed? Where is the wealth and poverty? What is the racial makeup of police precincts?

How would the U.S. House cutting funding for the ACS affect this project?

Joe Germuska: If the ACS was cancelled, this project would be a lot less valuable. I’m skeptical that it will actually be cancelled. ACS data is useful to businesses as well as governments and academics.

Does the U.S. Census setting up an API matter to your project? If so, why?

Joe Germuska: We won’t use the API for the project. I’m pretty sure that we will want to have too much data and do too much computation for it to be effective to make real-time queries for all of it. Also, one of our “stretch goals” is to distill our method for processing the Census data into a toolkit for people who want to do more intensive analysis beyond what our website provides, along the lines of these tools and bulk data.

It is probably the case, however, that we won’t bother trying to make an API ourselves. We dabbled in that for the first wave (see the Javascript library) but didn’t see much uptake. I suppose if along the way we think we can provide a substantially better service than the Census API we’ll consider it, but there are other more interesting “do it if we can” projects.

Ryan Pitts: I’m with Joe here. There are enough features we can work on that have day-to-day application that I’m not sure it makes a lot of sense to provide a competing API.


LocalData

My interview with Alicia Rouault, a 2012 Code for America fellow who worked with Amplify Labs’ Prashant Singh and Matt Hampel on the LocalData project, follows.

What does this platform change?

Alicia Rouault: LocalData helps residents, government agencies, and nonprofits rapidly gather accurate and useful neighborhood-level data. It also bridges the gap between technically useful data-oriented tools and simple, friendly consumer tools.

How has it helped Detroit residents?

Alicia Rouault: LocalData has improved the efficiency and scale at which place-based information has been collected. A Wayne State University graduate planning class collected data on more than 9,000 commercial parcels in a matter of weeks. The information is being applied and used, where traditional paper-based collection would have taken weeks of manual transcription.

What use has the data in it been applied to so far?

Alicia Rouault: Wayne State University’s graduate planning class has used the data to develop a citywide commercial corridor study for the City of Detroit’s Planning and Development Department. The extensive data has allowed the group to make recommendations on future zoning strategies for the city. You can find the report and collected data here. Currently, a smaller community development corporation in Detroit is piloting LocalData to survey housing conditions.

What will this funding let LocalData do?

Alicia Rouault: Knight Foundation funds allow us to continue work on LocalData beyond the fellowship year. We will use the runway to support the product in Detroit and bring it to other cities, including Chicago, Boston, New York, and San Francisco. We also plan to roll out new features that will help reduce the burden on technical assistance providers and make the data more useful.

Why is data important to cities? Why should people care about data collection in communities?

Alicia Rouault: Across the country, different kinds of groups (cities, nonprofits, universities) already collect place-based data. These groups add or update information on a variety of indicators — anything from recreational and environmental assets like parks, gardens and playgrounds, to the conditions of vacant lots and abandoned buildings, to where local businesses and institutions are located. This information is used to support advocacy campaigns, fundraising and political support of ideas or community interests.

It also helps with a critical need communities have to simply be informed! Media, governments and citizens can consume crowd-collected data to visualize the important components of where they live.

Does this help people on the other side of the digital divide? If so, how?

Alicia Rouault: Our goal is to make LocalData accessible to users of different ages, abilities, and digital access. In Detroit, we saw that many surveyors lacked smartphones or were more comfortable with pen and paper. We designed a scannable, paper-based survey to address that need.

Traditionally disempowered communities also benefit from more opportunities to collect data and tell their own stories.


Open Elections

Derek Willis, an elections developer at the New York Times, and Serdar Tumgoren, who works at the Washington Post, shared more about their plans for their Open Elections project in the interview below.

Why hasn’t a freely available, comprehensive source of official election results been created to date?

Derek Willis: For one reason, it’s a hard task. You’d need to retrieve data in different formats from 50 different states and, depending on the level of detail you wanted, thousands of counties. Another is that plenty of organizations have probably done this for their own local areas, but don’t always need results from a broader scope.

What are the barriers?

Derek Willis: Multiple sources of results data, in varying formats and with varying types of information. Some results are kept as PDFs or HTML files, while others might be in a more standardized format. A big barrier is that some of these results, particularly older ones (pre-2000), are likely only kept on paper in some states. States also don’t need to worry about connecting their data to other states or to federal data schemas such as that used by the Federal Election Commission.

What will creating this data feed change?

Derek Willis: It will make election results data available to anyone who wants to use it and enable them to get different slices of it. Say, a candidate’s electoral history or presidential election results for a single county. Serdar and I envisioned this kind of product because we wanted it for ourselves. We hope others will find it useful, too.

Why is it important?

Derek Willis: Election results are an important part of our political process, not simply on Election Day but when developing strategies and assessing a constituency. Historic results are not a guarantee of future electoral success, but they are an essential part of a journalist’s toolbox when covering elections.

What can be created with this data?

Derek Willis: We’re hoping that this data makes it possible to build rich interactives that can help explore results for a state, county or candidate. An example would be to illustrate how counties that meet certain demographic criteria have voted in elections, or to show where a candidate’s historic strongholds have been.

What data formats will you standardize on? Why? Will there be an API? Where will the data “live?” Will there be a license for it? Which one?

Derek Willis: We have not decided everything yet, but we will offer data in JSON, CSV and very likely some kind of XML format. We want to make it easy for people to use the data whether they have a spreadsheet or a programming language as tools. Since the data is public to begin with, it will be freely available, although Serdar and I have not talked about a specific license.

Serdar Tumgoren: We don’t currently have plans to offer a client-side API. If we do go that route, I’d again expect to lean on the cloud providers, aggressive caching via Varnish, and a long-lived TTL on resources. One of the virtues of dealing with historical data is that it doesn’t change much.

What are the technical challenges here? Are there legal or cultural challenges?

Derek Willis: It’s really more of a management challenge, in my opinion. We have to organize all of this data over as many years as we can, and provide a standard way to identify, catalog and retrieve it. There’s no legal challenge, in that this is all public data. There’s no question about that. To the extent that there are technical challenges, they likely will come in getting good data out of scanned PDFs and other older formats.

Serdar Tumgoren: With regard to scaling up, it depends on what aspects of the project are in question. I agree with Derek that the management challenge is the most daunting aspect. How do we organize a data gathering process scattered across 50 states and dozens of contributors? Our first step will be to build a centralized data admin (Django + Postgres, deployed to Heroku) to serve as the backbone of a public-facing dashboard. We expect we’ll need to “brute force” some of the data gathering via Mechanical Turk. Lastly, we need to develop clear and simple guidelines for contributors. The open source community is a real inspiration on this front. We want to set clear guidelines so data sleuths and civic hackers alike can dive in and contribute.

In terms of “scaling” our architecture, I expect we’ll lean on one of the cloud providers (AWS, Rackspace, etc.) to support data downloads. Because datasets will be customizable, we may need to use a combination of caching (e.g. Memached/Redis) and possibly task queuing to offload bigger jobs. This might mean a few minutes’ wait for people who request the data. It’s hard to say definitively until we see how popular we are.


Safecast

In 2012, mobile technology, social media and data all offer new opportunities for situational awareness through collaboration between first responders and society during natural disasters and other crises. A growing number of free or low-cost online tools now empower people to do more than just donate money or blood: now they can donate, time, expertise or, increasingly, act as sensors. Last year, Metroblogs co-founder and Los Angeles-based hacker Sean Bonner talked with me about radiation data and citizen science. The next phase of Safecast will enable people to help monitor air quality. My interview with Bonner follows.

What have you learned from the Safecast project to date, in terms of data quality, availability and storage?

Sean Bonner: I don’t think we’ve gotten too deep into worrying about data storage. Our dataset is still relatively small compared to some of the “big data” projects out there, but we’ve learned a lot about data quality and availability. The spark that set this whole thing into motion was the realization that our assumptions that quality data was readily available were wrong. I think a lot of people expect that things are being taken care of and that someone else is paying attention to the important stuff, so they don’t question things too closely.

Once we started taking readings ourselves and realized that we hadn’t previously had access to this kind of data, and nor had anyone else, that perception fell apart pretty quickly. I think the most important thing we’ve learned is that people don’t need to be reliant on some authority for information about their safety, and they can be better informed by getting involved themselves.

What did you learn from the ‘iGeigie’ and working on distributed projects?

Sean Bonner: The iGeigie was kind of a thought experiment, a proof of concept for the short term. With the manufacturing bottlenecks disappearing, it will be easier for people to get reliable devices and the parts to build devices themselves. Everything we’ve built is open source for that purpose, so people can get involved on their own terms.

How do you think about measuring the overall success of Safecast? How does that extend to this next phase?

Sean Bonner: On one hand, I think what we’ve pulled off with Safecast is already a clear success. We’ve showed that people can work together and move quickly to produce better results than large institutions and governments. We’ve collected more data in the last year than all previous efforts combined. If we packed up our bags and went home today none of that would change and that’s something I’m incredibly proud to have been a part of.

That said, I think there is still a lot we can do, both in data collection and outreach. While radiation has certainly been a hot topic, it’s distanced from many people. Interest in what’s happening “over there” can be passing. I think air quality is something that a larger audience will be able to relate to. It’s not happening somewhere else in the world: it’s right outside your door. So our potential impact is much greater.

How have government officials and regulators reacted to what you’ve done? Have they helped or hindered data collection? Have they used the data in making policy?

Sean Bonner: There’s no blanket answer to this. There have been government officials who have been very supportive of our work and taken steps to help us out, as well as those who have completely ignored our efforts.

I’m not sure how much policy has been affected. To be honest, that’s not something I care too much about. That people now have access to accurate data and can make difficult decisions based on concrete findings — rather than speculation — is what makes this worthwhile. That’s the kind of thing that policy should have taken care of before, but didn’t, so it’s kind of useless in my opinion. Results are what matter.

Why do open source hardware and open data about air quality matter?

Sean Bonner: The openness of both of these things matter because it lets people see how those results are obtained.

Which would you prefer: someone saying “here’s the data, just trust it” — or “here’s the data we found, here’s how we collected it, here’s data from other sources and how they all compare against each other, here’s the similarities and differences, and now you know everything we do?”

What could change because of this project?

Sean Bonner: When we started measuring in Fukushima, our first revelation was that the contamination and evacuation zones didn’t match up. The assumptions and the reality were different — in both directions.

But even the people who found out that they were living near contamination at least now had proof of that and data to show exactly what they were dealing with, rather than worrying about something they couldn’t actually identify.

In talking to people about air quality, everyone has a story about some place they know of that is really polluted or has really bad air — but they only know that because of what they’ve heard, not actual evidence they can point to. I think once we start taking measurements, we’ll be able to confirm some of those as well as rule out others. And once we take enough readings we may even be able to show sources of the pollution, and give people actual evidence to that effect rather than just speculation.

How will you invest this money in the project? How can the open source, maker and civic hacking communities help or collaborate with you?

Sean Bonner: A lot of this money will be spent on hardware, as well as bringing our distributed teams together regularly, which helps maintain focus.

Once we finalize the sensor and devices we’ll be using for this, then it will be all about the data. For communities that want to help, the best thing they will be able to do is build their own sensors and publish the data back to us. Because of the hands-on making element to this, we’re hoping making and hacking groups will be quick to help us start tracking things in their communities.

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.