Profile of the Data Journalist: The API Architect

Jacob Harris is building APIs and data into elections coverage at the New York Times.

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted a series of email interviews during the 2012 NICAR Conference.

Jacob Harris (@harrisj) is an interactive news developer based in New York City. Our interview follows.

Where do you work now? What is a day in your life like?

I work in the Interactive Newsroom team at the New York Times. A day in my life is usually devoted to coding rather than meetings. Currently, I am almost exclusively devoted to the NYT elections coverage, where I switch between the operations of loading election results from the AP or building internal APIs that provide data to our various parts of elections.nytimes.com. I also sometimes help fix problems in our server stack when they arise or sometimes get involved in other projects if they need me.

How did you get started in data journalism? Did you get any special degrees or certificates?

I have a classical CS education, with a combined B.A./M.Eng from MIT. I have no journalism background or experience. I never even worked for my newspaper in college or anywhere. I do have a profound skepticism and contrarian nature that does help me fit in well with the journalists.

Did you have any mentors? Who? What were the most important resources they shared with you?

I don’t have any specific mentors. But that doesn’t mean I haven’t been learning from anybody. We’re in a very open team and we all usually learn things from each other. Currently, several of the frontend guys are tolerating my new forays into Javascript. Soon, the map guys will learn to bear my questions with patience.

What does your personal data journalism “stack” look like? What tools could you not live without?

Our actual web stack is built on top of EC2, with Phusion Passenger and Ruby on Rails serving our apps. We also use haproxy as a load balancer. Varnish is an amazing cache that everybody should use. On my own machine, I do my coding currently in Sublime Text 2. I use Pivotal Tracker to track my coding tasks. I could probably live with a different editor, but don’t take my server stack away from me.

What data journalism project are you the most proud of working on or creating?

I have two projects I’m pretty proud of working on. Last year, I helped out with the Wikileaks War Logs reporting. We built an internal news app for the reporters to search the reports, see them on a map, and tag the most interesting ones. That was an interesting learning experience.

One of the unique things I figured out was how to extract MGRS coordinates from within the reports to geocode the locations inside of them. From this, I was able to distinguish the locations of various homicides within Baghdad more finely than the geocoding for the reports. I built a demo, pitched it to graphics, and we built an effective and sobering look at the devastation on Baghdad from the violence.

This year, I am working on my third election as part of Interactive News. Although we are proud of our team’s work in 2008 and 2010, we’ve been trying some new ways of presenting our election coverage online and new ways of architecting all of our data sources so that it’s easier to build new stuff. It’s been gratifying to see how internal APIs combine with novel new storytelling formats and modern browser technologies this year.

Where do you turn to keep your skills updated or learn new things?

Usually, I just find out about things by following all the other news app developers on Twitter. We’re a small ecosystem with lots of sharing. It’s great how everybody learns from each other. I have created a Twitter list @harrisj/news-hackers to help keep tabs on all the cool stuff being done out there. (If you know someone who should be on it, let me know.)

new TWTR.Widget({
version: 2,
type: ‘list’,
rpp: 100,
interval: 200,
title: ‘What’s happening in the world of ‘,
subject: ‘News Hackers?’,
width: ‘auto’,
height: 300,
theme: {
shell: {
background: ‘#8a0513′,
color: ‘#ffffff’
},
tweets: {
background: ‘#ffffff’,
color: ‘#444444′,
links: ‘#4099c2′
}
},
features: {
scrollbar: false,
loop: true,
live: true,
behavior: ‘default’
}
}).render().setList(‘harrisj’, ‘news-hackers’).start();

Why are data journalism and “news apps” important, in the context of the contemporary digital environment for information?

We live in a world of data. Our reporting should do a better job of presenting and investigating that data. I think it’s been an incredible time for the world of news applications lately. A few years back, it was just an achievement to put data online in a browsable way.

These days, news applications are at a whole other level. Scott Klein of ProPublica put it best when he described all good data stories as including both the “near” (individual cases, examples) and the “far” (national trends, etc.).

In an article, the reporter would be pick a few compelling “nears” for the story. As a reader, I also would want to know how my school is performing or how polluted my water supply is.

This is what news applications can do: tell the stories that are found in the data, but also allow the readers to investigate the stories in the data that are highly important to them.

This interview has been edited and condensed for clarity.

tags: , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.