"data product" entries

Big data is helping EA level up

Electronic Arts CTO Rajat Taneja on big data's growing role in the video game world.

Electronic Arts (EA) isn’t the first company that comes to mind when you think of big data. Yet the gaming company is collecting increasing amounts of data about its online players, and as this data accumulates and gains steam, it falls under the big data category.

If a game maker like EA is considered a big data company, it could have implications for other companies we might not think of as typical big data generators. With that in mind, I got in touch with Rajat Taneja, chief technology officer at EA and a keynote speaker at the upcoming Strata Conference in California. Since Taneja came on board with EA in 2011, he’s helped steer the company’s technological initiatives, including understanding the impact this growing data store will have on the firm — both from a processing standpoint and how to use it to provide games and services customers want most. He says no matter what your company does, if you have constantly connected online services, you are very likely going to be dealing with lots of data.

Our interview follows. Read more…

Now available: Big Data Now 2012 Edition

O'Reilly's annual data anthology explores the maturation of big data and data science.

Big Data Now 2012 EditionIn the first edition of our free Big Data Now anthology, the O’Reilly team tracked the birth and early development of data tools and data science. Now, with the second edition, we’re seeing what happens when big data grows up: how it’s being applied, where it’s playing a role, and the consequences — good and bad alike — of data’s ascendance.

We’ve organized the 2012 edition of Big Data Now into five areas:

Getting Up to Speed With Big Data — Essential information on the structures and definitions of big data.

Big Data Tools, Techniques, and Strategies — Expert guidance for turning big data theories into big data products.

The Application of Big Data — Examples of big data in action, including a look at the downside of data.

What to Watch for in Big Data — Thoughts on how big data will evolve and the role it will play across industries and domains.

Big Data and Health Care — A special section exploring the possibilities that arise when data and health care come together.

You can download free editions of Big Data Now 2012 in PDF, Mobi and EPUB formats here. The 2011 edition is also available.

Strata Rx is a wrap

Watch live keynotes from this week's Strata Rx Conference in San Francisco.

The intersection of big data and health care was explored at the O’Reilly Strata Rx Conference. The event has concluded, but you can still access an archive of videos, photos, and speaker slides. Read more…

Statwing simplifies data analysis

Quickly perform and interpret the results of routine Small Data analysis

With so much focus on Big Data, the needs of many analysts who work with Small Data tend to get ignored. The default tool for many of these users remains spreadsheets1 and/or statistical packages which come with a lot of features and options. However many analysts need a very small subset of what these tools have to offer.

Enter Statwing, a software-as-a-service provider for routine statistical analysis. While the tool is still in the early stages, it can already do many basic “data analysis” tasks.

Consider the following example of a pivot table constructed in Excel: this required 8 mouse-clicks, if you do everything perfectly, and about 5 decisions (what variables to include, what metric to use, …)

The same task in Statwing required 4 mouse-clicks and 0 decisions! Plus it comes with visuals:

The lack of clutter and the addition of a simple “headline” (“Female tends to have much higher values for satisfaction than Male“), makes the result much easier to interpret. The advanced tab contains detailed statistical analysis (in this case the p-value, counts, values). Many users get confused by the output/results produced by traditional statistical software. Let’s face it, many analysts have had little training in statistics. I welcome a tool that produces readily interpretable results.

The company hopes to replicate the above example across a wide variety of routine data analysis tasks. Their initial focus is on tools for (consumer) survey analysis, a potentially huge market given that online companies have made surveys so much easier to conduct. Users of Statwing pay a small monthly subscription, making it cheaper than most2 statistical packages. For a small monthly fee, their intuitive UI lets analysts get their tasks done quickly. More importantly Statwing may nurture aspiring data scientists in your organization.


(1) As this recent Strata presentation points out: Spreadsheets are the glue that keeps many organizations together.

(2) Open source tools like OpenOffice, R and Octave are free. So is the use of Google spreadsheets.

A startup takes on “the paper problem” with crowdsourcing and machine learning

With a new mobile app and API, Captricity wants to build a better bridge between analog and digital.

Unlocking data from paper forms is the problem that optical character recognition (OCR) software is supposed to solve. Two issues persist, however. First, the hardware and software involved are expensive, creating challenges for cash-strapped nonprofits and government. Second, all of the information on a given document is scanned into a system, including sensitive details like Social Security numbers and other personally identifiable information. This is a particularly difficult issue with respect to health care or bringing open government to courts: privacy by obscurity will no longer apply.

The process of converting paper forms into structured data still hasn’t been significantly disrupted by rapid growth of the Internet, distributed computing and mobile devices. Fields that range from research science to medicine to law to education to consumer finance to government all need better, cheaper bridges from the analog to the digital sphere.

Enter Captricity. The startup, which was co-founded by Jeff J. Lin and Kuang Chen, has its roots in the fieldwork on rural health Chen did as part of his PhD program.

“I was looking at the information systems that were available to these low-resource organizations,” Chen said in a recent phone interview. “I saw that they’re very much bound in paper. There’s actually a lot of efforts to modernize the infrastructure and put in mobile phones. Now that there’s mobile connectivity, you can run a health clinic on solar panels and long distance Wi-Fi. At the end of the day, however, business processes are still on paper because they had to be essentially fail-proof. Technology fails all the time. From that perspective, paper is going to stick around for a very long time. If we’re really going to tackle the challenge of the availability of data, we shouldn’t necessarily be trying to change the technology infrastructure first — bringing mobile phones and iPads to where there’s paper — but really to start with solving the paper problem.”

When Chen saw that data entry was a chokepoint for digitizing health indicators, he started working on developing a better, cheaper way to ingest data on forms. Read more…

A search for balance between the “wow” and “aha” in visualizations

Bitsy Bentley on the work behind a good visualization and why she hopes users will take data interactions for granted.

Because of the size, complexity and density of big data, it’s not always easy to find the important insights hiding in all that information. That’s where data visualization comes into play. A great visualization creates meaning where none existed.

Bitsy Bentley (@bitsybot) is the director of data visualization at GfK Custom Research, where she works with information designers to craft meaningful data experiences for a variety of business audiences. In the following interview, she discusses the space between a “wow” response and an “aha” moment, how her team addresses privacy concerns, and why practice is vital for both visualization creators and viewers.

Bentley will explore related visualization topics during her presentation at Strata Conference + Hadoop World in New York City later this month.

Why are data visualizations an effective way to understand the underlying data?

Bitsy BentleyBitsy Bentley: There is so much beauty and richness in big datasets, and now that we have enough processing power to harness that richness, it’s little wonder that interest in data visualization is exploding. To quote John Tukey: “The greatest value of a picture is when it forces us to notice what we never expected to see.” My clients find that, whether they’re more concerned with numbers or more concerned with stories, an appropriate visual is integral to their understanding of the data.

Visualization unlocks the serendipity of data analysis. It provides a language that is less intimidating than an overwhelming array of digits. Something as simple as a set of histograms breaking down the distribution of a data store makes it easy to find irregularities and outliers in the data. Read more…

Apple’s maps

Apple's maps problem isn't about software or design. It's about data.

Apple Maps screenshotI promise not to make any snarky remarks about Apple’s maps disaster, and the mistakes of letting a corporate vendetta get in the way of good business decisions. Oops, I lied. But it’s good to see that Tim Cook agrees, at least about quality of the maps. It’s humbling for a company like Apple to issue an apology.

The real issue isn’t the apology, but what happens next. Google seems to be in no hurry to submit a maps app. It’s unclear how much patience Apple’s customers have; on my Android phone, I probably use Google Maps more than anything else. Not having public transit information when I’m in New York would be a deal breaker for me. I suspect Apple’s fans are more loyal, but even that has limits. How long can the fanboys wait?

One article put Apple’s mapping efforts 400 years behind Google. That’s a lot of catch up. And Google certainly isn’t standing still: their addition of underwater photography to “street view” is spectacular, and may serve us well when sea levels rise. But that’s not the point, either. Apple doesn’t have to “catch up” to Google, though I’m sure they’d like to. They just have to get a product that’s good enough. I don’t think that’s a three-to-six-month proposition. But it could be done in a year or two. Read more…

Data Jujitsu: The art of turning data into product

Smart data scientists can make big problems small.

Having worked in academia, government and industry, I’ve had a unique opportunity to build products in each sector. Much of this product development has been around building data products. Just as methods for general product development have steadily improved, so have the ideas for developing data products. Thanks to large investments in the general area of data science, many major innovations (e.g., Hadoop, Voldemort, Cassandra, HBase, Pig, Hive, etc.) have made data products easier to build. Nonetheless, data products are unique in that they are often extremely difficult, and seemingly intractable for small teams with limited funds. Yet, they get solved every day.

How? Are the people who solve them superhuman data scientists who can come up with better ideas in five minutes than most people can in a lifetime? Are they magicians of applied math who can cobble together millions of lines of code for high-performance machine learning in a few hours? No. Many of them are incredibly smart, but meeting big problems head-on usually isn’t the winning approach. There’s a method to solving data problems that avoids the big, heavyweight solution, and instead, concentrates building something quickly and iterating. Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small.

We call this Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable. It’s related to Wikipedia’s definition of the ancient martial art of jujitsu: “the art or technique of manipulating the opponent’s force against himself rather than confronting it with one’s own force.”

How do we apply this idea to data? What is a data problem’s “weight,” and how do we use that weight against itself? These are the questions that we’ll work through in the subsequent sections.

Read more…

From smartphones and continuous data comes the social MRI

Dr. Nadav Aharony used phone sensors to explore personal behaviors and community trends.

It’s clear at this point that the smartphone revolution has very little to do with the phone function in these devices. Rather, it’s the unique mix of sensors, always-on connectivity and mass consumer adoption that’s shaping business and culture.

Dr. Nadav Aharony (@nadavaha) tapped into this mix when he was working on a “social MRI” study in MIT’s Media Lab. Aharony, who recently joined us as part of our ongoing foo interview series, described his vision of the social MRI:

“If you think about it, the three things you take with you when you go out of your home are your keys, your wallet and your phone, so our phones are always with us. In aggregate, we can use the phones in many people’s pockets as a virtual imaging chamber. So, one aspect of the social MRI is this virtual imaging chamber that is collecting tens or hundreds of signals at the same time from members of the community.” [Discussed at 1:16]

Aharony’s work focused on 150 participants (about 75 families) that were given phones for 15 months. During that time, more than one million hours of “continuous sensing data” was gathered with the participants’ consent. The data was acquired and scrubbed under MIT’s ethics guidelines, and for extra measure, Aharony included his own data in the dataset.

Collecting the data was just the beginning. Parsing that information and creating experiments based on emerging signals is where the applications of a social MRI became significant.
Read more…

Stories over spreadsheets

Kris Hammond on replacing rows and columns with sentences and paragraphs.

Imagine a future where clear language supplants spreadsheets. In a recent interview, Narrative Science CTO Kris Hammond explained how we might get there.