This interview is part of our ongoing look at the people, tools and techniques driving data journalism.
I first met Justin Arenstein (@justinarenstein) in Chişinău, Moldova, where the media entrepreneur and investigative journalist was working as a trainer at a “data boot camp” for journalism students. The long-haired, bearded South African instantly makes an impression with his intensity, good humor and focus on creating work that gives citizens actionable information.
Whenever we’ve spoken about open data and open government, Arenstein has been a fierce advocate for data-driven journalism that not only makes sense of the world for readers and viewers, but also provides them with tools to become more engaged in changing the conditions they learn about in the work.
He’s relentlessly focused on how open data can be made useful to ordinary citizens, from Africa to Eastern Europe to South America. For instance, in November, he highlighted how data journalism boosted voter registration in Kenya, creating a simple website using modern web-based tools and technologies.
For the last 18 months, Arenstein has been working as a Knight International Fellow embedded with the African Media Initiative (AMI) as a director for digital innovation. The AMI is a group of the 800 largest media companies on the continent of Africa. In that role, Arenstein has been creating an innovation program for the AMI, building more digital capacity in countries that are as in need of effective accountability from the Fourth Estate as any in the world. That disruption hasn’t yet played itself out in Africa because of a number of factors, explained Arenstein, but he estimates that it will be there within five years.
“Media wants to be ready for this,” he said, “to try and avoid as much of the business disintegration as possible. The program is designed to help them grapple with and potentially leapfrog coming digital disruption.”
In the following interview, Arenstein discusses the African media ecosystem, the role of Hacks/Hackers in Africa, and expanding the capacity of data journalism.
Why did you adopt the Hacks/Hackers model and scale it? Why is it relevant to what’s happening around Africa?
Justin Arenstein: African journalists are under-resourced but also poorly trained, probably even more so than in the U.S. and elsewhere. Very, very few of them have any digital skills, never mind coding skills. Simply waiting for journalists to make the leap themselves and start learning coding skills and more advanced digital multimedia content production skills is just too — well, we don’t have enough time to do that, if we’re going to beat this disruption that’s coming.
The idea was to clone parts of the basic model of Hacks/Hackers from the U.S., which is a voluntary forum and society where journalists, UI people, designers, graphics people and coders meet up on a regular basis.
Unlike in the U.S., where Hacks/Hackers is very focused on startup culture, the African chapters have been very focused on data-driven journalism and imparting some basic skills. We’re trying to avoid some of the pitfalls experienced in the U.S. and get down to using data as a key tool in creating content. A big weakness in a lot of African media is that there’s very little unique content, firstly, and that the unique content that is available is not particularly well produced. It’s not deep. It’s not substantiated. It’s definitely not linked data.
We’ve been focusing on improving the quality of the content so that the companies where these journalists work will be able to start weaning themselves from some of the bad business practices that they are guilty of and start concentrating on building up their own inventory. That’s worked really well in some of the African countries along the coastlines where there’s data access, because you’ve got cables coming in. In the hinterland of Africa, data and Internet are not widely available. The Hacks/Hackers chapters there have been more like basic computer-assisted reporting training organizations.
Like in the U.S., they all run themselves. But unlike in the U.S., we have a structured agenda, a set of protocols, an operating manual, and we do subsidize each of the chapters to help them meet the physical needs of cost. They’re not quite as voluntary as the U.S. ones; it’s a more formal structure. That’s because they’re designed to surface good ideas, to bring together a challenge that you wouldn’t ordinarily find in the media ecosystem at least, and then to help kick-start experimentation.
Do you see any kind of entrepreneurial activity coming out of them now?
Justin Arenstein: I’m not aware of any notable startups. We’ve had ideas where people are collaborating to build toward startups. I haven’t seen any products launched yet, but what we have seen is journalist-led startups that were outside of these Hacks/Hackers chapters now starting to come into the fold.
Why? Because this is where they can find some of the programming and engineering skills that they need, that they were struggling to find outside of the ecosystem. They are finding engineers or programmers, at least, but they’re not finding programmers who are tuned to content needs or to media philosophies and models. There’s a better chance that they’ll find those inside of these chapters.
The chapters are fairly young, though. The oldest chapter is about six months old now, and still fairly small. We’re nowhere near the size of some of the Latin American chapters. We have forged very strong links with them, and we follow their model a lot more closely than the U.S. model. The biggest chapter is probably about 150 members. They all meet, at a minimum, once a month. Interestingly, they are becoming the conduits not just for hackathons and “scrape-a-thons,” but are also now our local partners for implementing thinks like our data boot camps.
Those are week-long, intensive hands-on experiential training, where we’re flying in people from the Guardian data units, the Open Knowledge Foundation and from Google. We’re actually finding the guys behind Google Refine and Google Fusion Tables and flying in some of those people, so they can see end-users in a very different environment to what they’re used to. People walk into those boot camps not knowing what a spreadsheet is and, by the end of it, they’re producing their first elementary maps and visualizations. They’re crunching data.
What stories have “data boot camp” participants produced afterward?
Justin Arenstein: Here’s an example. We had a boot camp in Kenya. NTV, the national free-to-air station, had been looking into why young girls in a rural area of Kenya did very well academically until the ages of 11 or 12 — and then either dropped off the academic record completely or their academic performance plummeted. The explanation by the authorities and everyone else was that this was simply traditional; it’s tribal. Families are pulling them out of school to do chores and housework, and as a result, they can’t perform.
Irene Choge [a Kenyan boot camp participant who attended data journalism training] started mining the data. She came from that area and knew it wasn’t that [cause]. So she looked into public data. She first assumed it was cholera, so she looked into medical records. Nothing there. She then looked into water records. From water, she started looking into physical infrastructure and public works. She discovered these schools had no sanitation facilities and that the schools with the worst performing academics were those that didn’t have sanitation facilities, specifically toilets.
What’s the connection?
Justin Arenstein: When these girls start menstruating, there’s nowhere for them to go to attend to themselves, other than into the bushes around the school. They were getting harassed and embarrassed. They either stopped going to school completely or they would stop going during that part of their cycle and, as a result, their schoolwork suffered dramatically. She then produced a TV documentary that evoked widespread public outcry and changed policies.
In addition to that, her newsroom is working on building an app. A parent who watches this documentary and is outraged will then be able to use the app to find out what’s happening at their daughter’s school. If their daughter’s school is one of those that has no facilities, the app then helps them through a text-based service to sign a petition and petition the responsible official to improve the situation, as well as link up with other outraged parents. It mobilizes them.
What we liked about her example was that it was more than just doing a visualization, which is what people think about when you say “data journalism.”
First, she used data tools to find trends and stories that had been hidden to solve a mystery. Secondly, she then did real old-fashioned journalism and went out in the field and confirmed the data wasn’t lying. The data was accurate.
Thirdly, she then used the data to give people the tools to actually act on the information. She’s using open data and finding out in your district, this is your school, this is how you impact it, this is the official you should be emailing or writing to about it. That demonstrates that, even in a country where most people access information through feature phones, data can still have a massive impact at grassroots level.
These are the kinds of successes that we are looking for in these kinds of outreach programs when it comes to open data.
How does the practice of data-driven journalism or the importance of computer-assisted reporting shift when a reporter can’t use rich media or deploy bandwidth-heavy applications?
Justin Arenstein: We’re finding something that maybe you’re starting to see inklings of elsewhere as well: data journalism doesn’t have to be the product. Data journalism can also be the route that you follow to get to a final story. It doesn’t have to produce an infographic or a map.
Maps are very good ways to organize information. They’re very poor mechanisms for consuming information. No one kicks back on a Sunday afternoon laying on their sofa, reading a map, but if a map triggers geofenced information and pushes relevant local information at you in your vicinity, then it becomes a useful mechanism.
What we’re doing in newsrooms is around investigative journalism. For example, we’re funding projects around extractive industries. We’re mapping out conversations and relationships between people. We’re then using them as analytical tools in the newsroom to arrive at better, deeper and evidence-driven reporting, which is a major flaw and a major weakness in many African media.
What capacity needs to be built in these areas? What are people doing now? What matters most?
Justin Arenstein: Investigative journalism in Africa, like in many other places, tends to be scoop-driven, which means that someone’s leaked you a set of documents. You’ve gone and you’ve verified them and often done great sleuth work. There are very few systematic, analytical approaches to analyzing broader societal trends. You’re still getting a lot of hit-and-run reporting. That doesn’t help us analyze the societies we’re in, and it doesn’t help us, more importantly, build the tools to make decisions.
Some of the apps that we are helping people build, based off of their reporting, are invariably not visualizations. They’re rather saying, “Let’s build a tool that augments the reporting, reflects the deeper data that the report is based on, and allows people to use that tool to make a personal decision.” It’s engendering action.
A lot of the fantastic work you’ve seen from people at the Guardian and others has been about telling complex stories simply via infographics, which is a valid but very different application of data journalism.
I think that, specifically in East Africa and in Southern Africa, there’s growing recognition that the media are important stewards of historical data. In many of these societies, including industrialized societies like South Africa, the state hasn’t been a really good curator of public data and public information because of their political histories.
Nation states don’t see data as an asset? Is that because technical capacity isn’t there? Or is that because data actually contains evidence of criminality, corruption or graft?
Justin Arenstein: It’s often ineptitude and lack of resources in South Africa’s instance. In a couple of other countries, it’s systematic purging of information that is perhaps embarrassing when there’s a change of regime or political system — or in the case of South Africa and many of the colonial countries, a simple unwillingness or lack of insight as to the importance of collecting data about second-class citizens, largely the black population.
The official histories are very thin. There’s nowhere near the depth of nuance or insight into a society that you would find in the U.S. or in Europe, where there’s been very good archival record keeping. Often, the media are the only people who’ve really been keeping that kind of information, in terms of news reportage. It’s not brilliant. It’s often not primary sources — it’s secondary. But the point is that often it’s the only information that’s available.
What we’re doing is working with media companies now to help digitize and turn reportage into structured data. In a vacuum, because there is no other data, suddenly it becomes an important commercial commodity. Anyone who wants to build, for example, a tourism app or a transport app, will find that there is no other information available. This may sound like a bizarre concept to most people living in data-rich countries, like the U.S., but you simply can’t find the content. That means that you have to then go out and create the content yourself before you can build the app.
Is this a different sort of a “data divide,” where a country is “data-poor?”
Justin Arenstein: Well, maybe digitally “data poor,” because what we are doing is we’re saying that there is data. We initially also had the same reaction, saying “there is no data here,” and then realized that there’s a hell of a lot of data. Invariably, it’s locked up in deadwood format. So [we're now] liberating that data, digitizing it, structuring it, and then making sure that it’s available for people to use.
How much are media entities you work with making data, as opposed to just digitizing?
Justin Arenstein: Some are making data. We haven’t, because a lot of other actors are involved in citizen data creation. We haven’t really focused too many of our very scarce resources on that component yet.
We are funding a couple of citizen reporting apps, because there’s a lot of hype around citizen data and we’re trying to see if there are models that can really work where you create credible, sourced and actionable information. We don’t believe that you’re going to be able to do that just from text messaging. We’re looking at alternative kinds of interfaces and methods for transmitting information.
Are there companies and startups that are consuming the digital data that you’re producing? If so, what are they doing?
Justin Arenstein: Outside of the News Challenge, we are co-founding something with the World Bank called “Code for Kenya.” It’s modeled fairly closely on the Mozilla Open Use Fellowships, with a few tweaks. It’s maybe a hybrid of Code for America and the Mozilla Open Fellowships.
Where Code for America focuses on cities and Mozilla focuses on newsrooms, we’ve embedded open data strategists and evangelists into the newsrooms, backed up by an external development team at a civic tech lab. They’re structuring the data that’s available, such as turning old microfiche rolls into digital information, cleaning it up and building a data disk. They’re building news APIs and pushing the idea that rather than building websites, design an API specifically for third-party repurposing of your content. We’re starting to see the first early successes. Four months in, some of the larger media groups in Kenya are now starting to have third-party entrepreneurs developing using their content and then doing revenue-share deals.
The only investment from the data holder, which is the media company, is to actually clean up the data and then make it available for development. Now, that’s not a new concept. The Guardian in the United Kingdom has experimented with it. It’s fairly exciting for these African companies because there’s potentially — and arguably, larger — appetite for the content because there’s not as much content available. Suddenly, the unit cost of value of that data is far higher than it might be in the U.K. or in the U.S.
Media companies are seriously looking at it as one of many potential future revenue streams. It enables them to repurpose their own data, start producing books and the rest of it. There isn’t much book publishing in Africa, by Africans, for Africans. Suddenly, if the content is available in an accessible format, it gives them an opportunity to mash-up stuff and create new kinds of books.
They’ll start seeing that content itself can be a business model. The impact that we’re seeking there is to try and show media companies that investing in high-quality unique information actually gives you a long-term commodity that you can continue to reap benefits from over time. Whereas simply pulling stuff off the wire or, as many media do in Africa, simply lifting it off of the web, from the BBC or elsewhere, and crediting it, is not a good business model.