Strata Newsletter: January 19, 2012

Data scientists need agility and an entrepreneurial outlook. Plus: data news of note.

O'Reilly Strata - Newsletter Header

"A so-called ‘blackout’ is yet another gimmick, albeit a dangerous one, designed to punish elected and administration officials who are working diligently to protect American jobs from foreign criminals."
—Chris Dodd, CEO of the MPAA, ringing a belt full of buzz words on the SOPA/PIPA protests

Top-of-the-List Thinking from Edd and Alistair

Some days it feels as if wrapping your head around the very concept of data, large or small, is impossible. In a series of fresh posts for O’Reilly Radar, we’ve been trying to bring some clarity to the space. In a new piece titled “What is Big Data?” Edd writes: “The emergence of big data into the enterprise brings with it a necessary counterpart: agility. Successfully exploiting the value in big data requires experimentation and exploration. Whether creating new products or looking for ways to gain competitive advantage, the job calls for curiosity and an entrepreneurial outlook.”

We’re always aiming to stay agile in every way. Hope you are, too.

Edd Dumbill

Alistair Croll

Edd Dumbill & Alistair Croll
Chairs, Strata

Strata Conference 2012

Making Data Work

February 28 – March 1, 2012 | Santa Clara, CA

Strata Photos

Use code NEWS20 to save 20%.

Tracks include: Data Science, Business & Industry, Visualization & Interface, Hadoop & Big Data, Policy & Privacy, and Domain Data.

Register Now & Save

Quick Bytes

Short Items of Massive Interest

TechCrunch playing BlackJack
“When I heard that Tyler Gray at Public Knowledge was looking for someone to do some analysis on tweets that mentioned SOPA, I thought I might try Cytoscape (an open source tool used for biomedical research, but handy for large scale data visualization) to show some of the relationships between people discussing the controversial bill on Twitter,” writes Fred Benenson. The result is astounding. Benenson admits that analyzing data that large is more an art than a science.

Data Science a Sport
Want to determine which new car is the safest vehicle, how much dark matter is in the universe, and which patients are at risk of going to the hospital this year? So did Allstate Insurance, NASA, and the Heritage Health Foundation. They each did the same thing: They appealed to the 20,000 or so data scientists who make Kaggle their steam–blower site, their downtime time-waster, the place of their fiercest competition. Jeremy Howard of Kaggle presents at the upcoming Strata Conference to discuss how his site brings together the best and brightest minds in data science while everyone else is scratching their heads wondering how to find a single data scientist.

So Negative It’s Necessary?
Considering the social use of negative feedback, Peter Coffee sees an unusual tension, “As we move into the era of the social enterprise, we need to find convergence rather than contrast: we need to build social systems that are predictably useful and effective, which means we need to engineer them without making them inhuman.”

Possibility Thinking
The Cloud U group on LinkedIn is a robust community of well–informed folks who take huge delight in good-natured discourse. (Yes, they love to argue.) A recent thread on the topic of cloud complexity and the concealment thereof prompted a rousing discussion. Admitting that he arrived somewhat late to this particular party, Alistair adds his two cents, harkening back to 17th century mathematics, before bringing the discussion back to the current day: “Gottfried Leibniz spent a lot of time trying to figure out what the best of all possible worlds was. He concluded that it’s the one of plentitude, where the fewest starting conditions give us the most outcomes. In his words, the best world would ‘actualize every genuine possibility.’ I suspect Leibniz would have considered today’s connected, abstracted, service-oriented Internet a better world than yesterday’s islands of client-server and mainframe computing. Biology and cloud computing are complex. They’re messy. And according to Leibniz, they’re also better, because they allow more possibilities.”

Viz Biz Intel

Data vs Placebo

Funny how one thing leads unnaturally to another. O’Reilly author Stephen Few was listening to Science Friday, considering asthma, and pondering placebos when he was struck by something about data visualization. Perhaps, he thought, a clever visual can have as little effect as a mere placebo. He wondered: "If people enjoy your infographic, isn’t that enough? Or in the realm of information dashboards, if the CEO has fun looking at the flashy gauges, isn’t that enough? No, it isn’t. Both are meant to inform. To understand the story of an infographic or an organization’s performance on a dashboard requires real information. Enjoying a pretty picture and feeling like you’ve been informed is not the same as the actual understanding that’s needed to make better decisions." Made us think.

Dept. of Backbone

Despite its power, PostgreSQL has always been slightly obscured by the shining glory MySQL attracted, says Edd. But while we weren’t looking, it has quietly moved to be a major backbone of database innovation: not least in big data products. Curt Monash’s article is an interesting catch-up on why PostgreSQL is important, and why it needs strong leadership.

This Week in StartUps

Just the Factuals

“I’ve just had a fascinating conversation with Gil Elbaz of Factual,” Edd recently wrote on his Google+ page. “If you’re a developer and want geo data for your apps, they’re a great place to look. As well as underpinning some very large web sites, they also offer a regular API to developers, which is free to trial. Gil’s emphasis isn’t just about selling data, though. Factual is engineered to be both a commercial concern and a data community. Consumers can also contribute, improving the data set for everyone. After talking with me, Gil headed over to This Week in Startups, where he is archived talking about the CommonCrawl project with Nova Spivack." It’s all well worth your notice.

Free Strata Online Conference

Towards the Quantified Society

Wednesday, Jan. 25, from 9am–11am PST–>

Strata Photos

In this free online conference, slated for Wednesday, Jan. 25, we’ll look at hot topics, brewing controversies, and cutting–edge technologies that promise to change how we live, work, play, learn, and love. Hear from some of Strata’s marquee speakers about what’s consuming them in the Big Data world in this web-based event.

Strata Online Conference
Wednesday, Jan. 25, from 9am–11am PST

Register Now for Free

The Final Bit


TechCrunch playing BlackJack

I’m quite impressed with the range and different tone of content on PandoDaily, a new publication launched by ex–TechCrunch staffer Sarah Lacy, says Edd. This edit team is some Internet royalty, with going to the trouble of crafting an infographic explaining who’s got what skin in this game. Good reading regardless.

Looking for more? Visit

Share this newsletter:



Forward to Friend

In this Issue:

  • Top-of-the-List Thinking from Edd and Alistair
  • Strata Conference 2012
  • Quick Bytes
  • Free Strata Online Conference
  • The Final Bit

Follow us: Twitter


<!– original right rail

Free Webcasts:
Meet experts online.

More Webcasts »

New Titles:

More New Titles »

Free Reports:


tags: , , , , ,