How NPR is Embracing Open Source and Open APIs

Daniel Jacobson Will Talk About the NPR Open API at OSCON

You may also download this file. Running time: 14:14

Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.

News providers, like most content providers, are interested in having their content seen by as many people as possible. But unlike many news organizations, whose primary concern may be monetizing their content, National Public Radio is interested in turning it into a resource for people to use in new and novel ways as well. Daniel Jacobson is in charge making that content available to developers and end users in a wide variety of formats, and has been doing so using an Open API that NPR developed specifically for that purpose. Daniel will talk about how the project is going at OSCON, the O’Reilly Open Source Convention. Here’s a preview of what he’ll be talking about.

James Turner: Can you start by explaining what NPR Digital Media is and what your role with it involves?

Daniel Jacobson: Sure. NPR is a radio organization, of course, and the Digital Media Group, of which I’m a part, handles, essentially as I describe it, everything that is publishable by NPR that does not go to a radio. So that includes the website, podcasts, API, mobile sites, HD radios, anything that has some sort of visual component to it. So Digital Media as a group is responsible for producing that content, producing all of those distribution channels, managing all of those relationships.

James Turner: And what is your particular role there?

Daniel Jacobson: I manage the application development team that is responsible for all the functional aspects of all of the systems, which includes our CMS, all of the templating engines for the website, for the API, for the podcasts, all of the engines that drive that.

James Turner: Now NPR is an organization that consists of a lot of member stations kind of flying in close formation. What’s your relationship with the content producers? To what extent do they have their own stuff, and to what extent do you work together?

2009_0223_npr_logo.jpgDaniel Jacobson: Those member stations are really exactly that; they are members of NPR. They essentially buy NPR programming. They’re distinct organizations from us. NPR is a content producer and distributor. They buy our programming and broadcast it out to the world. They also have their own corresponding web teams that can take NPR content and also produce their own content and create their own websites. So in the Digital Media Team, we take a lot of pride and effort in providing services that help those member stations better serve their communities and their listeners and audiences, using NPR content and using their own content. We work with them to try and satisfy their missions. And to the extent that they need NPR services or content, we work hard to try and provide those. The API is one massive step, I think, in making it much easier for them to do what they need to do without a whole lot of intervention from us, where previously they would have to pull in content in much more arduous ways. So the API, I think, is a step in the right direction to make it more of a self-service model.

James Turner: Since you’ve mentioned the API, that’s what you’re going to be talking about at OSCON. We’ve already talked to the New York Times and the way they’re opening up their content through APIs. What are you doing with yours?

Daniel Jacobson: Well, we launched ours formally at OSCON last year. And at that time, we essentially opened up our entire archive. So anything that you can get on is available through the API, to the extent that we have the rights to distribute it. There are some rights restrictions, for example, for receiving photos or stories from sources that we have not cleared rights to redistribute. Those are getting suppressed through a rights filtering engine on our API. Everything else that you can get on, you can get through the API. That includes full text. It includes images, audio, video, everything like that. Throughout the last year, we have added more features. We included the layer of “mix your own podcast”, for example, which allows people to not only get the content in audio form, but also to download it as a podcast-type item. And all of that is available through search terms or totally customized queries. So what the API really does is it enables people to take the content, make widgets, or do whatever they want with essentially everything that is on and get to audiences that we are not getting to.

James Turner: This probably isn’t as much of a factor for you because, in some ways, you’re not dependent on the same kind of revenue streams as a lot of news and content providers. But when you provide that kind of access, isn’t there a bit of a fear that you can get even more to the point where portal sites and aggregators like Google can essentially steal your traffic?

Daniel Jacobson: Well, I’m glad you brought that up. We do have terms of use that do mitigate people from essentially creating another archive of NPR content. So we’re not encouraging people to just go forth and take our content and do everything that we’re doing. We want people to use this in a way that is providing a value-add. We don’t want them to archive and duplicate our stuff. We also want it to be for personal or noncommercial use. Anybody who wants to use it for commercial use, which would include Google or whomever else, needs to talk to us and set up an arrangement, some sort of contract or agreement with us. So that said, we do want to encourage people to take this content and do very creative things. And that was the purpose of opening up.

James Turner: So it’s been open for a while. What are some of the things you’ve seen people do with it?

Daniel Jacobson: Well, one of the most interesting things is we call it the Flubacher app which is essentially an iPhone application that somebody in the world built. His name is Brad Flubacher. And he’s essentially taking the API content and putting it into this iPhone app. And you can stream content within the iPhone app, all of our programs or our topics. And when I say stream, it’s essentially doing API queries every time you make a request. So he’s not archiving all of this content. It’s just basically a pass-through engine. It’s been very popular and a very interesting application.

A lot of our member stations are doing very creative things. Minnesota Public Radio, for example, just launched their new site. And they’re making extensive use of the API. North Country Public Radio is another one where they’ve said that they have, I think, 50 percent of their pages or so have NPR API content on it. So our member stations are making heavy use.

I’ve seen a lot of instances of people making code wrappers. There’s a Ruby on Rails code wrapper for our API. There’s a Perl one that someone just created. So a lot of people are out there doing very clever things with it. And we’re just looking forward to more and more uses.

James Turner: So obviously you’re familiar and probably a fan of open source. How is NPR using open source technologies?

Daniel Jacobson: So internally, with exception of our database, all of our systems are employing open source technologies. I assume that’s what you mean. What open source technologies are we using? So our database is Oracle. And our plan is to migrate that to MySQL. But over the last couple of years, we’ve really adopted open source more and more. All of our coding engines, and we’re not using any proprietary application servers or anything like that, it’s all open source, Apache, all that kind of stuff.

It’s very important to us that we keep the open source model. And as we look more towards open source, we’re kind of changing our vision to be less of a consumer of the open source products and more of a contributor. And I think that’s what you see with the API. It’s the first step to say, “How can we contribute back to this community and give them the things that we’re good at?” Which would be content, in this case. And more and more, we’re going to start looking towards opening up our applications and saying, “Here, go fork this. Go make interesting things with it.” And I think over the next couple of months you’re going to see a lot more open source applications coming from NPR.

James Turner: As I mentioned earlier, the New York Times has got their open API. You have an API. Is there any effort going on to try to standardize for this type of content a single API that would allow people to use common code throughout all of these data sources?

Daniel Jacobson: That’s a great question. I’m involved in a resource group for PBCore, as an example. PBCore was really set up to be a public broadcasting core, but there are a lot of other organizations that are starting to adopt it as more of a standard for passing data back and forth between the organizations. I’m not sure if that’s going to be as pervasive in the overall marketplace. With respect to New York Times and other organizations that are outside of that circle of PBCore, we actually haven’t had many conversations about formalizing some sort of standard across us. I think that’s a very interesting idea. That said, there are already a host of standards out there in the world. And NPR has tried with our API to really make the API adhere to as many standards as possible.

We have our own custom tagging language which we call NPRML, which we built, and which is essentially the language or the XML structure that essentially closely mirrors to our content. But we can now put all of our content in media RSS or podcast RSS or Atom, or I think there are a total of eight or nine total outputs. And next on the docket will be NewsML and PBCore or probably PBCore first. And so we’re trying to make our content as standards compliant as possible.

I think your question is, is there some other standard that would allow for more richer content to be standardized across all of these news organizations. It’s a really interesting question. I don’t now that all of the organizations are going to have the philosophy of opening up as much as NPR has. So, for example, New York Times does not offer full text content in theirs; we do. Our source is really heavily weighted towards audio and theirs isn’t. So there are going to be some differences across them that make it a little bit more challenging.

But we are collaborating a lot with these organizations. I also want to add that New York Times and NPR will be hosting a mash-up camp at OSCON on Friday. And this is an example of one of those steps where we’re really trying to play nicely with all of these other organizations and trying to unify in front of the public, you know, “We are both media organizations. We want to get everybody kind of focused on the same concepts.” I think your proposal of a next step towards a standard of process might come down the road.

James Turner: What do you see coming on the horizon both for NPR and if you want to put on your oracle hat, more generally in the news business?

Daniel Jacobson: Well, for NPR, digital is obviously very important as it is for most other media organizations. And over the next several months, you’re going to see a lot of changes for NPR. We are focusing a lot of energy towards distribution channels, portability. I think portability is a huge factor in this marketplace. And you’re asking about down the road. My view is I really see webpages and websites, browser-based, PC-based experiences, they’re going to start diminishing in importance. I don’t know exactly what the timeframe is. It could be a couple of years. It could be five years. I don’t know. But at some point, it’s going to plateau and mobile’s going to surpass it. And having content be portable is going to be paramount.

So I think that NPR’s philosophy is going to mirror with that. We’re putting a big emphasis on portability. That’s why the API is so critical, not only for end-users in the world but also for all of our business needs. We spend a lot of time with business partners, getting them to understand the API so that they can more easily tap into our content and service in their environment. So it’s all about distribution at this point for us. And I think over the next three to five years, you’re going to see a lot more people consuming NPR content on the go rather than in front of the computer.

James Turner: It sounds like you’re going to be fairly busy at OSCON, but is there anything beyond the stuff that you’re participating in that’s caught your eye or has you excited?

Daniel Jacobson: I will be honest. I’m going to be at OSCON for about a day-and-a-half, and that’s because we have some major launches later this month. So I’ve got to swing in, do my stuff, and swing out, which is regretful. But there are a couple sessions that I did notice. I think there were some talks about microformats and, of course, portability, and HTML 5. Those were the things that caught my eye.

James Turner: All right. Well, Daniel, thank you so much for taking the time to talk to us. And it’ll be great to be hearing more from NPR.

Daniel Jacobson: Great. Thank you so much.

tags: , , , ,
  • On his This American Life Podcast #382, Ira Glass notes that TAL currently costs $120K/year for distribution bandwidth. I’m sure other NPR shows have similar costs.

    James, do you know if NPR has considered using BitTorrent as an alternative means of distributing their shows? If that practice caught on, that could take a big chunk out of NPR’s bandwidth costs. I’d be willing to be a seed for NPR broadcasts that I listen to.

    A big issue would be the stigma that BitTorrent has in the past as a conduit for illegal downloads. That’s a bit of a chicken-and-egg problem: if NPR and others started using torrents for legitimate media, its reputation would improve.

    I’d love to help NPR lower their costs and simultaneously see the world shift to a community model for distributing podcasts.

  • This is a great piece on ‘portability’ of content! As we evolve into more mobile services the need for smart devices to share on a community level evolves as well. I really believe the GREEN movement has us all seeking ways to participate in our own industries with minimal impact and still move at the speed of light, or faster. The need for portability and mobility has driven a communications into innovative delivery systems.
    Working in the event/advertising industry (large format printing) has benefited from the digital delivery in content. We are launching a digital banner stand in September that has HD quality, portability, changeable content options to capture the next wave of demand for faster ad campaigns to be delivered anywhere. dBs will be the next wave.

  • I agree that NPR should be using BitTorrent for distributing its downloads. There’s no excuse for asking your listeners for financial donations, and then going and spending those donations to buy bandwidth that those same listeners could be donating to you using BitTorrent, at a lower cost to them and a much higher ratio of contributors to listeners. is a good example of a site that supports both HTTP and BitTorrent downloads of their audio, but encourages BitTorrent downloads by delaying the HTTP download. (They could do better by seeding all their torrents, though. At present, if you try to download an unpopular album, you may end up getting it faster via HTTP.)

  • Very interesting article, and in particular the comment from readers at the end about using peer-to-peer as a content distribution mechanism for NPR digital files.

  • Thanks for the comments. First of all, I want to make the distinction that This American Life is not an NPR program. Rather, it is produced by WBEZ in Chicago and distributed by Public Radio International. The solicitations for donations to support TAL’s bandwidth are to help fund WBEZ, a distinct public radio station that airs NPR programming in addition to programming from other sources, including PRI, APM and itself. That said, as you point out bandwidth costs do add up and NPR (as well as other public radio organizations and for that matter, any other content distributor) could benefit from distributing the expense as well.

    So, the BitTorrent suggestion is a very interesting one. NPR is open to, and is actively exploring all opportunities to reduce cost, including BitTorrent. In many ways, a peer-to-peer distribution channel makes a lot of sense. However, there are a few major questions that arise that so far have prevented NPR from implementing this. Some of these questions pertain to reliability, rights management, sponsorship and metrics.

    As an example, good rights management is critical for major media organizations. In some cases, NPR updates source files (or even removes them) due to corrections or rights restrictions. In a distributed network where NPR is abstracted away from the content, can NPR ensure that the updates override the originals for all subsequent downloads?

    Additionally, if we distribute through peer-to-peer networks, we also need to reconcile the challenges related to sponsorship and metrics. Will we have enough control over the assets to be able to apply our sponsorships to them? Will we get adequate metrics on those files delivered? Can we differentiate reliably between full and partial downloads? These metrics help us refine our products and make them better for our users. Moreover, without them, we also cannot secure attractive sponsors.

    We will continue to explore this and other options. If we can find some way to ensure our other goals are satisfied within that framework, it would be foolish for us not to consider using that kind of technology. If you know other examples that have been able to solve the various reliability, rights, sponsorship and metrics within that architecture, please let me know!

    Again, thanks for the comments and the ideas.

  • The requirement to register completed BitTorrent downloads could be added to the protocol. This is something that the majority of legitimate providers would want and that downloaders of legitimate media should not object to. If publishers are interested in those metrics, they should work with the maintainers of BitTorrent to add that capability.

    Part of what’s in a torrent is a checksum for the whole file. If you delete the torrent and publish a new one, the updated torrent would then be the one that would be findable by anyone seeking that particular media. That would not stop downloads of the old torrent in progress. On the other hand, there are always going to be old copies of shows in a variety of place on the internet no matter what the distribution means.

    Thanks for responding to the comment thread, Daniel.

  • You might want to look at – this is a good example of how people (educators in this case) take content provided by other organizations or individuals – revise (or improve it) and make it available for others to use. I like the ability to take content (in whole or in part) that others have provided – embed it into my content – assemble a new package – and make it available for others to use. I also like to see how others have rated the content, how they have used it, etc. Content is the service – not the thing worth protecting with IP or worth monetizing….one man’s humble opinion.