"democratization of data" entries

We need open models, not just open data

If you really want to understand the effect data is having, you need the models.


Writing my post about AI and summoning the demon led me to re-read a number of articles on Cathy O’Neil’s excellent mathbabe blog. I highlighted a point Cathy has made consistently: if you’re not careful, modelling has a nasty way of enshrining prejudice with a veneer of “science” and “math.”

Cathy has consistently made another point that’s a corollary of her argument about enshrining prejudice. At O’Reilly, we talk a lot about open data. But it’s not just the data that has to be open: it’s also the models. (There are too many must-read articles on Cathy’s blog to link to; you’ll have to find the rest on your own.)

You can have all the crime data you want, all the real estate data you want, all the student performance data you want, all the medical data you want, but if you don’t know what models are being used to generate results, you don’t have much. Read more…

Curiosity turned loose on GitHub data

Ilya Grigorik's GitHub project shows what happens when questions, data, and tools converge.

GitHub Archive logoI’m fascinated by people who:

1. Ask the question, “I wonder what happens if I do this?” and then follow it all the way through.

2. Start a project on a whim and open it up so anyone can participate.

Ilya Grigorik (@igrigorik) did both of these things, which is why our recent conversation at Strata Conference + Hadoop World was one of my favorite parts of the event.

By day, Grigorik is a developer advocate on Google’s Make the Web Fast team (he’s a perfect candidate for a future Velocity interview). On the side, he likes to track open source projects on GitHub. As he explained during our chat, this can be a time-intensive hobby:

“I follow about 3,000 open source projects, and I try to keep up with what’s going on, what are people contributing to, what are the new interesting sub-branches of work being done … The problem I ran into about six months ago was that, frankly, it was just too much to keep up with. The GitHub timeline was actually overflowing. In order to keep up, I would have to go in every four hours and scan through everything, and then repeat it. That doesn’t give you much time for sleep.” [Discussed 15 seconds into the interview.]

Grigorik built a system — including a newsletter— that lets him stay in the loop efficiently. He worked with GitHub to archive public GitHub activity, and he then made that data available in raw form and through Google BigQuery (the data is updated hourly).

This is a fun project, no doubt, but it’s also a big deal. Here’s why: When you shorten the distance between questions and answers, you empower people to ask more questions. It’s the liberation of curiosity, and that’s exactly what happened here. Read more…

Every company has a big data issue

GoodData's Roman Stanek on how big data applies to all businesses, including the small ones.

When you bandy about a term like “big data” often enough, it tends to lose its meaning. But big data is much more than a marketing term, although it is that, too — it’s a means of trying to understand and control the sheer volume of information we are seeing inside and outside our organizations.

It’s easy to dismiss this as a problem for companies like Google and Facebook, which are gathering mountains of data from users. However, as GoodData CEO Roman Stanek (@RomanStanek) points out in the following interview, the growing amounts of data from a variety of sources makes big data an issue that has an impact on every company, regardless of size.

Stanek, who has been an entrepreneur for more than 20 years, started GoodData in 2007 as a way to simplify business intelligence by putting it in the cloud. Today, he sees big data as more than a business intelligence problem, and as he has watched his business evolve, he believes companies like his can take big data out of the realm of data scientists and put it into the hands of ordinary business users.

There is a perception that big data is a big company problem. What role does big data have in small- to medium-size organizations?

Roman StanekRoman Stanek: Big data comes from hundreds of sources, most of which are outside a company’s firewalls, such as customer interactions, social media and emails. A company’s size is irrelevant to the volume of big data it has to manage and understand. For example, a company with 100 employees may have to answer thousands of customer-support calls coming in from Facebook, Twitter, email and telephone. That’s a massive amount of data it has to deal with.

In addition, big data represents tremendous potential wealth for all companies, no matter how small or large those enterprises are. When businesses are smart about leveraging data, they can make better and faster business decisions. Read more…

Data as seeds of content

A look at lesser-known ways to extract insight from data.

Visualizations are one way to make sense of data, but they aren't the only way. Robbie Allen reveals six additional outputs that help users derive meaningful insights from data.

Data's next steps

RedMonk's Steve O'Grady weighs in on data's pressing issues.

Redmonk analyst Steve O'Grady discusses the demand for data scientists, the problem of using data to asking the right questions, and why you shouldn't rush into a NoSQL investment.

Data’s next steps

RedMonk's Steve O'Grady weighs in on data's pressing issues.

Redmonk analyst Steve O'Grady discusses the demand for data scientists, the problem of using data to asking the right questions, and why you shouldn't rush into a NoSQL investment.

Top stories: February 20-24, 2012

Data for the public good, the coming health IT revolution, big data in the cloud.

This week on O'Reilly: Alex Howard examined data's civic role, Dr. Farzad Mostashari discussed health IT and patient empowerment, and Edd Dumbill surveyed big data cloud offerings.