Profile of the Data Journalist: The Human Algorithm

Ben Welsh is augmenting the investigative work of the Los Angeles Times with open source coding.

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted a series of email interviews during the 2012 NICAR Conference.

Ben Welsh (@palewire) is an Web developer and journalist based in Los Angeles. Our interview follows.

Where do you work now? What is a day in your life like?

I work for the Los Angeles Times, a daily
newspaper and 24-hour Web site based in Southern California. I’m a member
of the Data Desk, a team of reporters and
Web developers that specializes in maps, databases, analysis and
visualization. We both build Web applications and conduct analysis for
reporting projects.

I like to compare The Times to a factory, a factory that makes information.
Metaphorically speaking, it has all sorts of different assembly lines. Just
to list a few, one makes breaking local news, another makes beautifully rendered narratives, another makes battleship-like investigative projects.

A typical day involves juggling work on difference projects, mentally
moving from one assembly line to the other. Today I patched an embryonic open-source release, discussed our next move on a pending public records request, guided the real-time publication of <a href="results from the GOP primaries in Michigan and Arizona, and did some preparation for how we’ll present a larger dump of results on Super Tuesday.

How did you get started in data journalism? Did you get any special
degrees or certificates?

I’m thrilled to see new-found interest in “data journalism” online. It’s
drawing young, bright people into the field and involving people from
different domains. But it should be said that the idea isn’t new.

I was initiated into the field as a graduate student at the Missouri School
of Journalism. There I worked at the National Institute for Computer-Assisted Reporting , also known as NICAR. Decades before anyone called it “data journalism,” a disparate group of misfit reporters discovered that the data analysis made possible by computers enabled them to do more powerful investigative reporting. In 1989, they founded NICAR, which has, for decades, been training data skills
to journalists and nurtured a tribe of journalism geeks. In the time since, computerized data analysis has become a dominant force in investigative reporting, responsible for a large share of the field’s best work.

To underscore my point, here’s a 1986 Time magazine article about how
newsmen are enlisting the machine.”

Did you have any mentors? Who? What were the most important resources they
shared with you?

My first journalism job was in Chicago. I got a gig working for two great people there, Carol Marin and Don Moseley, who have spent most of their careers as television journalists. I worked as their assistant. Carol and Don are warm people who are good teachers, but they are also excellent at what they do. There was a moment when I realized, “Hey, I can do this!” It wasn’t just something I heard about in class, but I could actually see myself doing.

At Missouri, I had a great classmate named Brian
Hamman
, who is now at the New York Times. I remember seeing how invested Brian was in the Web, totally committed to Web development as a career path. When an opportunity opened up to be a graduate assistant at NICAR, Brian encouraged me to pursue it. I learned enough SQL to help do farmed-out investigative work for TV stations. And, more importantly, I learned that if you had technical skills you could get the job to work on a cool story.

After that I got a job doing data analysis at the Center for Public Integrity in Washington DC. I had the opportunity to work on investigative projects, but also the chance to learn a lot of computer programming along the way. I had the guidance of my talented coworkers, Daniel Lathrop, Agustin Armendariz, John Perry, Richard Mullins and Helena Bengtsson. I learned that computer programming wasn’t impossible. They taught me that if you have a manageable task, a few friends to help you out and a door you can close, you can figure out a lot.

What does your personal data journalism “stack” look like? What tools
could you not live without?

I do my daily development in Ubuntu
Linux
, spending most of my day flipping between the gedit text editor, “>Byobu’s slick implementation of the screen terminal and the Chromium browser. And, this part may be hard to believe, but I love Ubuntu
Unity
. I don’t understand what everybody is complaining about.

I do almost all of my data management in the Python Web development
framework Django and
PostgreSQL’s database, even if
the work is an exploratory reporting project that will never be published. I find that the structure of the framework can be useful for organizing just about any data-driven project.

I use GitHub for both version-control and
project management. Without it, I’d be lost.

What data journalism project are you the most proud of working on or
creating?

As we all know, there’s a lot of data out there. And, as anyone who works
with it knows, most of it is crap. The projects I’m most proud of have
taken large, ugly data sets and refined them into something worth knowing:
a nut graf in an investigative story, or a
data-driven app that gives the reader some new
insight into the world around them. It’s impossible to pick one. I like to
think the best is still, as they say in the newspaper business,
<a href="TK.

Where do you turn to keep your skills updated or learn new things?

Twitter is a great way to keep up with what is getting other programmers excited. I know a lot of people find social media overwhelming or distracting, but I feel plugged in and inspired by what I find there. I wouldn’t want to live without it.

GitHub is another great source. I’ve learned so much just exploring other
people’s code. It’s invaluable.

Why are data journalism and “news apps” important, in the context of the
contemporary digital environment for information?

Computers offer us an opportunity to better master information, better
understand each other and better watchdog those who would govern us. I
tried to talk about some of the ways simply thinking about the process of
journalism as an algorithm can point the way at last week’s NICAR
conference in a talk called “Human-Assisted Reporting.” In my opinion, we should aspire to write code that embodies the idealistic principles and investigative methods of the previous generation. There’s all this data out there now, and journalistic algorithms, “robot
reporters,” can help us ask it tougher questions.

tags: , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.