Strata Week: Machine learning vs domain expertise

Debating the data skills of machines and experts, a key data move for Microsoft, and Google Analytics gets social.

Here are a few of the data stories that caught my attention this week:

Debating the future of subject area expertise

Data Science Debate panel at Strata CA 12
The “Data Science Debate” panel at Strata California 2012. Watch the debate.

The Oxford-style debate at Strata continues to be one of the most-talked-about events from the conference. This week, it’s O’Reilly’s Mike Loukides who weighs in with his thoughts on the debate, which had the motion “In data science, domain expertise is more important than machine learning skill.” (For those that weren’t there, the machine learning side “won.” See Mike Driscoll’s summary and full video from the debate.)

Loukides moves from the unreasonable effectiveness of data to examine the “unreasonable necessity of subject experts.” He writes that:

“Whether you hire subject experts, grow your own, or outsource the problem through the application, data only becomes ‘unreasonably effective’ through the conversation that takes place after the numbers have been crunched … We can only take our inexplicable results at face value if we’re just going to use them and put them away. Nobody uses data that way. To push through to the next, even more interesting result, we need to understand what our results mean; our second- and third-order results will only be useful when we understand the foundations on which they’re based. And that’s the real value of a subject matter expert: not just asking the right questions, but understanding the results and finding the story that the data wants to tell. Results are good, but we can’t forget that data is ultimately about insight, and insight is inextricably tied to the stories we build from the data. And those stories are going to be ever more essential as we use data to build increasingly complex systems.”

Microsoft hires former Yahoo chief scientist

Microsoft has hired Raghu Ramakrishnan as a technical fellow for its Server and Tools Business (STB), reports ZDNet’s Mary Jo Foley. According to his new company bio, Ramakrishnan’s work will involve “big data and integration between STB’s cloud offerings and the Online Services Division’s platform assets.”

Ramakrishnan comes to Microsoft from Yahoo, where he’s been the chief scientist for three divisions — Audience, Cloud Platforms and Search. As Foley notes, Ramakrishnan’s move is another indication that Microsoft is serious about “playing up its big data assets.” Strata chair Edd Dumbill examined Microsoft’s big data strategy earlier this year, noting in particular its work on a Hadoop distribution for Windows server and Azure.

Analyzing the value of social media data

How much is your data worth? The Atlantic’s Alexis Madrigal does a little napkin math based on figures from the Internet Advertising Bureau to come up with a broad and ambiguous range between half a cent and $1,200 — depending on how you decide to make the calculation, of course.

In an effort to make those measurements easier and more useful, Google unveiled some additional reports as part of its Analytics product this week. It’s a move Google says will help marketers:

“… identify the full value of traffic coming from social sites and measure how they lead to direct conversions or assist in future conversions; understand social activities happening both on and off of your site to help you optimize user engagement and increase social key performance indicators (KPIs); and make better, more efficient data-driven decisions in your social media marketing programs.”

Engagement and conversion metrics for each social network will now be trackable through Google Analytics. Partners for this new Social Data Hub, include Disqus, Echo, Reddit, Diigo, and Digg, among others.

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O’Reilly Fluent Conference (May 29 – 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.


tags: , , , ,