Making government health data personal again

An interview with Fred Smith of the CDC on their open content APIs.

Health care data liquidity (the ability of data to move freely and securely through the system) is an increasingly crucial topic in the era of big data. Most conversations about data liquidity focus on patient data, but other kinds of information need to be able to move freely and securely, too. Enter several government initiatives, including efforts at agencies within the Department of Health and Human Services (HHS) to make their content more easily available.

Fred Smith is team lead for the Interactive Media Technology Team in the Division of News and Electronic Media in the Office of the Associate Director for Communication for the U.S. Centers for Disease Control and Prevention (CDC) in Atlanta. We recently spoke by phone to discuss ways in which the CDC is working to make their information more “liquid”: easier to access, easier to repurpose, and easier to combine with other data sources.

Which data is available from the CDC APIs?

Fred Smith, CDC

Fred Smith, CDC

Fred Smith: In essence, what we’re doing is taking our unstructured web content and turning it into a structured database, so we can call an API into it for reuse. It’s making our content available for our partners to build into their websites or applications or whatever they’re building.

Todd Park likes to talk about “liberating data” — well, this is liberating content. What is a more high-value dataset than our own public health messaging? It incorporates not only HTML-based text, but also we’re building this to include multimedia — whether it’s podcasts, images, web badges, or other content — and have all that content be aware of other content based on category or taxonomy. So it will be easy to query, for example: “What content does the CDC have on smoking prevention?”

Let’s say there was a survey on youth tobacco use. Instead of saying, “Congratulations, here’s 678,000 rows of the dataset,” we can say, “Here’s the important message that you can use in your state about what teens are doing in your particular area of the country.” We’re distilling information down to useful messages or relevant data visualizations, and then pointing back to the open datasets.

You mentioned making content available for your partners. Who are they?

Fred Smith: It’s a combination of other government health agencies, like other agencies inside HHS such as FDA [the Food and Drug Administration] or NIH [National Institutes of Health], other federal agencies like VA [the Department of Veterans Affairs] or DOD [Department of Defense], the state and local health departments, universities, hospitals, non-profit organizations like the American Cancer Society or the American Heart Association, or other public health non-profits.

What do you hope people will do with the content?

Fred Smith: Communication hinges on knowing one’s audience. On the federal level, we have an understanding of the country as a whole. But in a given state or county, they may know that certain messages work better. So by enabling these credible, scientific messages to be reused, the people who are building products and might know their micro-audience better than we do can get the benefit of using evidence-based messaging tailored for their audiences.

For example, say that a junior in a high school somewhere in Nebraska has started to learn web programming and APIs, and wants to write an application that she knows will help students in her high school avoid smoking. She can build something with their high school colors or logo, but fill it with our scientific content. It helps the information to improve people’s health to go down to a local level and achieve something the government couldn’t achieve on its own.

We took my daughter into the pediatrician a number of years ago, and the doctor was telling us about her condition, but it was something I’d never heard of before. She said, “Just a moment…” and went to the computer and printed off something from cdc.gov and handed it to me. My first reaction was, “Whew, my baby’s going to be okay.” My second was, “Ooo, that’s the old web template.” My third was, “If that had been flowed into a custom template from my doctor’s office, I would have felt a lot more like my doctor knows what’s going on, even if the information itself came from CDC.” People trust their health care providers, and that’s something we want to leverage.

It seems that you’re targeting a broad spectrum of developers here, rather than scientists or researchers. Why that choice of audience?

Fred Smith: The scientific community and researchers already know about CDC and our datasets, and how to get hold of them. So just exposing the data isn’t the issue. The issue is more: How can we expand the impact of these data? Going back to the digital government strategy, the reason that the federal government is starting to focus on opening these datasets, opening APIs, and going more mobile, is to increase our offering of citizen services toward the end of getting the information and what it all means out to the public better.

It’s a question of transparency, but throwing open the data is only part of it. Very few people really want to spend time analyzing a two-million-row dataset.

Are you finding any resistance to echoing government messaging, or are people generally happy to redistribute the content?

Fred Smith: We’re fortunate at the CDC that we have strong brand recognition and are considered very trustworthy and credible, and that’s obviously what we strive for. We sometimes get push-back, but generally our partners like to use the information and they were going to reuse it anyway; this just gives them a mechanism to use it more easily.

We work with a lot of state and local health departments, and when there’s some kind of outbreak — for example, SARS — we often start out with a single page. SARS was new and emerging to the entire world, the CDC included. We were investigating rapidly, and in the course of a few days, we went from one page to dozens or more; our website was constantly being updated. But we’ve got these public health partners who are not geared up for 24/7/365 operations the way we are. The best they could do was link out to us and hope that their visitors followed those links. In some cases, they copied and pasted, but they couldn’t keep up with events. So, allowing this API into the content — so they can use our JavaScript widget — means that they get to make sure that their content stays up-to-date and their recommendations stay current.

How is this project related to the Blue Button initiative, if at all?

Fred Smith: It’s not, really. That’s focused on an EHR [electronic health record], and health records are essentially doctors’ notes written for other doctors; they are not necessarily notes written out to the patient. Content services or content syndication could be leveraged to put a little context around that health record. For example, someone could write an application so that when you downloaded the data from the Blue Button, unknown terms could be looked up and linked from the National Library of Medicine. It could supplement your health record with the science and suggestions from the CDC and other parts of HHS. We think it would be a great add-on.

What data might be added to the API in the near future?

Fred Smith: The multimedia part will be added in the next 8–10 months. CDC has a number of datasets that are already publicly available, but many of them don’t yet have a RESTful API into them yet, particularly some of the smaller databases. So we’re looking at what can we open up.

And we’re not only doing this here at CDC. This idea of opening up a standard API into our content, including multimedia, is a joint effort among several agencies within HHS. We’re in varying stages of getting content into these, but we’re working to make this system interchangeable so this content can flow more easily from place to place. Most people don’t care how the federal government is organized or what the difference in mission is between NIH and CDC, for example. The more we can use these APIs to break down some of those content silos, the better it is for us and for the general public.

We’re excited about that, and excited that this core engine is an open source project — we’ve released it to SourceForge. A lot of our mobile apps use this same API, and we’ll be releasing the base code for those products as well in the next couple of months.

This interview was edited and condensed.


If the disruption of health care and associated opportunities interests you, O’Reilly has more to offer. Check out our ongoing coverage; our report, “Solving the Wanamaker problem for health care“; and the upcoming Strata Rx conference in Boston, September 25-27.

tags: , , , ,