"data products" entries

What it means to “go pro” in data science

A look at what it takes to be a professional data science programmer.

Noahs_Ark_Paul_K_FlickrMy experience of being a data scientist is not at all like what I’ve read in books and blogs. I’ve read about data scientists working for digital superstar companies. They sound like heroes writing automated (near sentient) algorithms constantly churning out insights. I’ve read about MacGyver-like data scientist hackers who save the day by cobbling together data products from whatever raw material they have around.

The data products my team creates are not important enough to justify huge enterprise-wide infrastructures. It’s just not worth it to invest in hyper-efficient automation and production control. On the other hand, our data products influence important decisions in the enterprise, and it’s important that our efforts scale. We can’t afford to do things manually all the time, and we need efficient ways of sharing results with tens of thousands of people.

There are a lot of us out there — the “regular” data scientists; we’re more organized than hackers but with no need for a superhero-style data science lair. A group of us met and held a speed ideation event, where we brainstormed on the best practices we need to write solid code. This article is a summary of the conversation and an attempt to collect our knowledge, distill it, and present it in one place. Read more…

Six ways data journalism is making sense of the world, around the world

Early responses from our investigation into data-driven journalism had an international flavor.

When I wrote that Radar was investigating data journalism and asked for your favorite examples of good work, we heard back from around the world.

I received emails from Los Angeles, Philadelphia, Canada and Italy that featured data visualization, explored the role of data in government accountability, and shared how open data can revolutionize environmental reporting. A tweet pointed me to a talk about how R is being used in the newsroom. Another tweet linked to relevant interviews on social science and the media:

Two of the case studies focused on data visualization, an important practice that my colleague Julie Steele and other editors at O’Reilly Media have been exploring over the past several years.

Several other responses are featured at more length below. After you read through, make sure to also check out this terrific Ignite talk on data journalism recorded at this year’s Newsfoo in Arizona. Read more…

As digital disruption comes to Africa, investing in data journalism takes on new importance

Justin Arenstein is building the capacity of African media to practice data-driven journalism.

This interview is part of our ongoing look at the people, tools and techniques driving data journalism.

I first met Justin Arenstein (@justinarenstein) in Chişinău, Moldova, where the media entrepreneur and investigative journalist was working as a trainer at a “data boot camp” for journalism students. The long-haired, bearded South African instantly makes an impression with his intensity, good humor and focus on creating work that gives citizens actionable information.

Justin ArensteinWhenever we’ve spoken about open data and open government, Arenstein has been a fierce advocate for data-driven journalism that not only makes sense of the world for readers and viewers, but also provides them with tools to become more engaged in changing the conditions they learn about in the work.

He’s relentlessly focused on how open data can be made useful to ordinary citizens, from Africa to Eastern Europe to South America. For instance, in November, he highlighted how data journalism boosted voter registration in Kenya, creating a simple website using modern web-based tools and technologies.

For the last 18 months, Arenstein has been working as a Knight International Fellow embedded with the African Media Initiative (AMI) as a director for digital innovation. The AMI is a group of the 800 largest media companies on the continent of Africa. In that role, Arenstein has been creating an innovation program for the AMI, building more digital capacity in countries that are as in need of effective accountability from the Fourth Estate as any in the world. That disruption hasn’t yet played itself out in Africa because of a number of factors, explained Arenstein, but he estimates that it will be there within five years.

“Media wants to be ready for this,” he said, “to try and avoid as much of the business disintegration as possible. The program is designed to help them grapple with and potentially leapfrog coming digital disruption.”

In the following interview, Arenstein discusses the African media ecosystem, the role of Hacks/Hackers in Africa, and expanding the capacity of data journalism. Read more…

Startup Showcase: And the finalists are …

A compelling crop of companies will present at the Strata Conference + Hadoop World Startup Showcase.

We had a wide range of startups apply for a slot in the Strata Conference + Hadoop World Startup Showcase. Our selection committee, which included investors, entrepreneurs, and executives from SAP — which is sponsoring the event — whittled these down to just a few, who will get a chance to strut their stuff in the Big Apple next week.

All sorts of early-stage firms applied, both those using data as a key differentiator, and those building the next-generation infrastructures that can handle the torrent of information our world produces. We also had applicants who visualize, communicate, and democratize, turning complex, chewy data into bite-sized, interactive nuggets that are easier to digest.

It’s a compelling crop of new entrants into today’s vibrant big data ecosystem, and we’re thrilled to welcome them to next week’s event, where Tim O’Reilly and Fred Wilson face the unenviable task of choosing the top three.

Startup Showcase finalists

New ethics for a new world

The biggest threat that a data-driven world presents is an ethical one.

Since the first of our ancestors chipped stone into weapon, technology has divided us. Seldom more than today, however: a connected, always-on society promises health, wisdom, and efficiency even as it threatens an end to privacy and the rise of prejudice masked as science.

On its surface, a data-driven society is more transparent, and makes better uses of its resources. By connecting human knowledge, and mining it for insights, we can pinpoint problems before they become disasters, warding off disease and shining the harsh light of data on injustice and corruption. Data is making cities smarter, watering the grass roots, and improving the way we teach.

But for every accolade, there’s a cautionary tale. It’s easy to forget that data is merely a tool, and in the wrong hands, that tool can do powerful wrong. Data erodes our privacy. It predicts us, often with unerring accuracy — and treating those predictions as fact is a new, insidious form of prejudice. And it can collect the chaff of our digital lives, harvesting a picture of us we may not want others to know.

The big data movement isn’t just about knowing more things. It’s about a fundamental shift from scarcity to abundance. Most markets are defined by scarcity — the price of diamonds, or oil, or music. But when things become so cheap they’re nearly free, a funny thing happens.

Consider the advent of steam power. Economist Stanley Jevons, in what’s known as Jevons’ Paradox, observed that as the efficiency of steam engines increased, coal consumption went up. That’s not what was supposed to happen. Jevons realized that abundance creates new ways of using something. As steam became cheap, we found new ways of using it, which created demand.

The same thing is happening with data. A report that took a month to run is now just a few taps on a tablet. An unthinkably complex analysis of competitors is now a Google search. And the global distribution of multimedia content that once required a broadcast license is now an upload. Read more…

Top Stories: April 2-6, 2012

Data and context are always linked, data outputs beyond visualizations, state of the computer book market.

This week on O'Reilly: Mike Loukides explained why problems arise when data is taken out of social contexts, Robbie Allen looked at six ways insight can be extracted from datasets, and Mike Hendrickson analyzed the current state of the computer book market.

The evolution of data products

The data that drives products is shifting from overt to covert.

The real changes in our lives will come from products that have the richness of data without calling attention to the data.

New tools and techniques for applying climate data

A workshop shows early signs of climate scientists and data scientists coming together.

Climate cycles, machine learning and improved models were all part of the discussions at the first New York Academy of Sciences Workshop on Climate Informatics.

Strata Week: Green pigs and data

Rovio mines data to improve Angry Birds, HP bets on big data, Daily Dot parses the social web for stories.

Rovio, the company behind Angry Birds, is using data and analytics to keep bird-launching gamers plugged in. Also, HP's acquisition of Autonomy reveals its data intentions, and the Daily Dot finds stories with an assist from data journalism.

Data and the human-machine connection

Opera Solutions' Arnab Gupta says human plus machine always trumps human vs machine.

Managing data and extracting meaning require new approaches, new education, and even a new language. Opera Solutions CEO Arnab Gupta discusses each of these areas in the following interview.