Finding and telling data-driven stories in billions of tweets

Simon Rogers

Twitter has hired its first data editor. Simon Rogers, one of the leading practitioners of data journalism in the world, will join Twitter in May. He will be moving his family from London to San Francisco and applying his skills to telling data-driven stories using tweets. James Ball will replace him as the Guardian’s new data editor.

As a data editor, will Rogers keep editing and producing something that we’ll recognize as journalism? Will his work at Twitter be different than what Google Think or Facebook Stories delivers? Different in terms of how he tells stories with data? Or is the difference that Twitter has a lot more revenue coming in or sees data-driven storytelling as core to driving more business? (Rogers wouldn’t comment on those counts.)

The gig clearly has potential and Rogers clearly has demonstrable capacity. As he related to me today, in an interview, “what I’m good at is explaining data, simplifying it and making it accessible.”

That’s a critical set of skills in business, government, or media today. Data-driven journalists have to understand data sources, quality, context, and underlying biases. That’s equally true of Twitter. Pew Research reminded us in 2013 that Twitter is not representative of everyone and is often at odds with public opinion.

Tweets aren’t always a reliable source to understand everything that happens in the world but it’s undeniable that useful insights can be found there. It has become a core component of the set of digital tools and platforms that journalists apply in their work, connected to smartphone phones, pens, water bottles and notebooks. News frequently breaks on Twitter first and is shared by millions of users independently of any media organization. Journalists now use Twitter to apply a trade that’s well over a century old: gather and fact-check reports, add context, and find the truth of what’s happening. (Picking up the phone and going to location still matter, naturally.) The amount of misinformation on Twitter during major news events puts a high premium on the media debunking rumors and sharing accurate facts.

Will the primary difference in Rogers’ ability to find truth and meaning in the tweets be access to Twitter’s full firehose, developers, and processing power? His work will have to be judged on its own merits. Until he starts his new gig in May, the following interview offers more insight into why he joined Twitter and how he’s thinking about what he’ll be doing there.

Why leave the paper now?

Simon Rogers: I love the Guardian and have always wanted to work here. I grew up in a house where we read two papers: The Guardian during the week and the Observer on Sundays. I’ve had offers but this is the first job where it’s become a serious possibility.

There are a few reasons.

Firstly, Twitter is an amazing phenomenon. It’s changed every level of how we work as reporters. We really saw that during the “Reading the Riots” project. There we had 1.6 million riot-related tweets which Twitter gave us to analyze.

London Riots

That was important because politicians were agitating about the ‘role’ of Twitter during the disturbances. The work that our team did with academics at Manchester and the subsequent interactive produced by Alastair Dant and the interactive team here opened my eyes to the facts that:

Twitter and the way it’s used tells us a lot about every aspect of life
The data behind those tweets can really shine a light on the big stories of the moment
If you can combine that data with brilliant developers you have a really powerful tool

Secondly, Twitter is an amazing place from what I’ve seen so far. There’s a real energy about the place and some brilliant people doing fascinating things. I love the idea of being part of that team.

Thirdly, I’ve been at the Guardian nearly 15 years. I am so comfortable and confident in what I do there that I need a new challenge. This all just came together at the right time.

As a data-driven journalist, you’ve had to understand data sources, quality, context, and underlying biases. How does that apply to Twitter?

Simon Rogers: Absolutely. Mark Twain said “a lie can be halfway around the world before the truth has got its boots on.” All social media encourages that.

I think the work we did with the riot tweets shows how the truth can catch up fast. What interested me about Boston was the way that people were tweeting calmness, if you like.

I think we’ve seen this with the Datablog in general: that people used to worry that the masses weren’t clever enough to understand the data that we were publishing. In fact, the community self-rights itself, correcting errors other readers or even ourselves had perpetrated. That’s really interesting to me.

What will you be able to do at Twitter with data that you couldn’t do at the Guardian data desk?

Simon Rogers: Just to be there, in the midst of that data will be amazing. I think it will make me better at what I do. And I hope I have something to offer them too.

Will you be using the same tools as you’ve been applying at the Guardian?

Simon Rogers: I’m looking forward to learning some new ones. I’m comfortable with what I know. It’s about time I became uncomfortable.

Twitter has some of the world’s best data scientists. What makes being a data editor different from being a data scientist?

Simon Rogers: I’m not the world’s best statistician. I’m not even very good at maths. I guess what I’ve been doing at The Guardian is acting as a human bridge between data that’s tricky to understand; and a wider audience that wants to understand it. Isn’t that what all data journalism is?

My take on being a data editor at the Guardian was that I used it as a way to make data more accessible – crucially the understanding of it. I need to understand it, to make it clear to others and I want to explain that data in ways that I can understand. Is that the difference between data editors and data scientists? I don’t know – I think a lot of these definitions are artificial anyway.

It’s like people getting data journalism and data visualization mixed up. I think they are probably different things and involve different processes, but in the end, does it matter anyway?

This interview was edited and condensed.

O’Reilly Strata Conference — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.

Strata Rx Health Data Conference: September 25-27 | Boston, MA
Strata + Hadoop World: October 28-30 | New York, NY
Strata in London: November 15-17 | London, England

Finding and telling data-driven stories in billions of tweets

Twitter has hired Guardian Data editor Simon Rogers as its first data editor.

As a data-driven journalist, you’ve had to understand data sources, quality, context, and underlying biases. How does that apply to Twitter?

What will you be able to do at Twitter with data that you couldn’t do at the Guardian data desk?

Will you be using the same tools as you’ve been applying at the Guardian?

Twitter has some of the world’s best data scientists. What makes being a data editor different from being a data scientist?

Get the O’Reilly Data Newsletter