The future of data at scale

The O'Reilly Radar Podcast: Turing Award winner Michael Stonebraker on the future of data science.


Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

In March 2015, database pioneer Michael Stonebraker was awarded the 2014 ACM Turing Award “for fundamental contributions to the concepts and practices underlying modern database systems.” In this week’s Radar Podcast, O’Reilly’s Mike Hendrickson sits down with Stonebraker to talk about winning the award, the future of data science, and the importance — and difficulty — of data curation.

One size does not fit all

Stonebraker notes that since about 2000, everyone has realized they need a database system, across markets and across industries. “Now, it’s everybody who’s got a big data problem,” he says. “The business data processing solution simply doesn’t fit all of these other marketplaces.” Stonebraker talks about the future of data science — and data scientists — and the tools and skill sets that are going to be required:

It’s all going to move to data science as soon as enough data scientists get trained by our universities to do this stuff. It’s fairly clear to me that you’re probably not going to retread a business analyst to be a data scientist because you’ve got to know statistics, you’ve got to know machine learning. You’ve got to know what regression means, what Naïve Bayes means, what k-Nearest Neighbors means. It’s all statistics.

All of that stuff turns out to be defined on arrays. It’s not defined on tables. The tools of future data scientists are going to be array-based tools. Those may live on top of relational database systems. They may live on top of an array database system, or perhaps something else. It’s completely open.

Read more…


Practical advice for hardware startups

Entering the hardware space is easier than ever. Succeeding is a different matter.

Photo of 3D printing head by Jonathan Juursema via Wikimedia Commons. Used under a Creative Commons license.

Save 25% on registration for Solid with code SLD25. Solid is our conference on the convergence of software and hardware, and the Internet of Things.

Because of recent innovations in prototyping, crowdfunding, marketing, and manufacturing, it has never been easier — or cheaper — to launch a hardware startup than it is now. But while turning a hardware project into a product is now relatively easy, doing it successfully is still hard.

Renee DiResta and Ryan Vinyard, co-authors of The Hardware Startup, recently got together with Solid Conference chair Jon Bruner to discuss the startup landscape in hardware and the IoT, and what entrepreneurs need to know to build their businesses. Read more…


Building self-service tools to monitor high-volume time-series data

The O'Reilly Data Show Podcast: Phil Liu on the evolution of metric monitoring tools and cloud computing.

One of the main sources of real-time data processing tools is IT operations. In fact, a previous post I wrote on the re-emergence of real-time, was to a large extent prompted by my discussions with engineers and entrepreneurs building monitoring tools for IT operations. In many ways, data centers are perfect laboratories in that they are controlled environments managed by teams willing to instrument devices and software, and monitor fine-grain metrics.

During a recent episode of the O’Reilly Data Show Podcast, I caught up with Phil Liu, co-founder and CTO of SignalFx, a SF Bay Area startup focused on building self-service monitoring tools for time series. We discussed hiring and building teams in the age of cloud computing, building tools for monitoring large numbers of time series, and lessons he’s learned from managing teams at leading technology companies.

Evolution of monitoring tools

Having worked at LoudCloud, Opsware, and Facebook, Liu has seen first hand the evolution of real-time monitoring tools and platforms. Liu described how he has watched the number of metrics grow, to volumes that require large compute clusters:

One of the first services I worked on at LoudCloud was a service called MyLoudCloud. Essentially that was a monitoring portal for all LoudCloud customers. At the time, [the way] we thought about monitoring was still in a per-instance-oriented monitoring system. [Later], I was one of the first engineers on the operational side of Facebook and eventually became part of the infrastructure team at Facebook. When I joined, Facebook basically was using a collection of open source software for monitoring and configuration, so these are things that everybody knows — Nagios, Ganglia. It started out basically using just per-instance instant monitoring techniques, basically the same techniques that we used back at LoudCloud, but interestingly and very quickly as Facebook grew, this per-instance-oriented monitoring no longer worked because we went from tens or thousands of servers to hundreds of thousands of servers, from tens of services to hundreds and thousands of services internally.

Read more…


“The purpose of the IoT is to give humans superpowers”

Tim O’Reilly and Cory Doctorow talk about the opportunities and challenges presented by the Internet of Things.

In a recent conversation, Tim O’Reilly and Cory Doctorow addressed a wide variety of issues surrounding the evolution of the Internet of Things. We’ve distilled their conversation into a free downloadable report, “Opportunities and Challenges in the IoT: A Conversation with Tim O’Reilly and Cory Doctorow,” in which they address questions from folks on Twitter about security, the impact of the IoT on industry, IoT innovation in the public sector, and much more.

Taking a look at industry, O’Reilly addressed a Twitter question from @leahthehunter regarding which companies and technologies are most profoundly impacting the evolution of the Internet of Things:

I think the biggest mistake people make with the Internet of Things is in thinking that it’s about devices. Sure, there are sexy devices: your Nest thermostat. Your Internet-connected drone, or whatever, and people go, ‘Oh, awesome.’ Yes, and there’s things like smart TVs, but the biggest impact to me seems to be when you start thinking about how sensors and devices can change the way you actually do things. … What really seems interesting is if you take Uber as a model of an Internet of Things company and use that as your icon rather than, say, Nest, you say, ‘Oh wait a minute — what’s really happening here is we’re saying once you have connectivity and sensors out in the world, you can actually completely rethink an industry.’

O’Reilly noted that the biggest opportunities in the IoT lie not in new devices but in rethinking user behavior to design better user experiences and increase value for users. Doctorow agreed, pointing out that he’s interested in the notion of “treating human beings as things that are good at sensing and not things that are there to be sensed.” Read more…


The smartest way to program smart things: Node.js

The reasons to use Node.js for hardware are simple: it’s standardized, event driven, and has very high productivity.


Save 25% on registration for Solid with code SLD25. Solid is our conference on the convergence of software and hardware, and the Internet of Things.

Node.js is on the rise for programming hardware. The full Google V8 version helps run Intel’s Edison chip. The IoT community has already embraced Node.js for embedded devices and robotics, with notable examples including Nodebots and Cylon. And now, even smaller devices like Tessel 2 — a development platform for prototyping hardware — are using JavaScript.

Why is this a big deal? It makes programming hardware much simpler — college students can learn Node.js in a weekend. And it makes it possible to build and program an entire IoT device, from start to finish, in less than four hours. This may very well be the future of hardware programming.

Intel principal engineer Michael McCool will be at O’Reilly’s Solid Conference, June 23-25, 2015, to lead a workshop on using Node.js and HTML5 to program the Internet of Things. “In only three and a half hours, we’re going to walk people through building a complete and sophisticated IoT system,” McCool told me in an interview. That includes building a hardware prototype, hardware interfacing, streaming telemetry, building a UI on the phone, and creating an app. “The Web server part is just five lines of code. The rest of it is similarly simple,” he said. “The complete code is only about 200 lines on the embedded device, plus a little bit more…when you add in graphs of things for streaming data.” Read more…


What to see at Solid

A look at the interdisciplinary learning paths you'll find at Solid.


Save 25% on registration for Solid with code SLD25. Solid is our conference on the convergence of software and hardware, and the Internet of Things.

Flipping through the schedule for our Solid conference, you might wonder why we offer talks on synthetic biology in the same program that includes sessions on smart factories and how to ship goods within supply chains. The answer is that Solid is about a nascent movement — new hardware — that draws on a lot of different areas of expertise. It’s about access and the idea that physical things are becoming easier for anyone to create and engineer. Understanding hardware and the Internet of Things, then, is critical for every technologist and every company.

Solid’s program has emphasized interdisciplinary learning from the beginning; we’ve seen that a smart, accessible, connected world will need contributions from a lot of different backgrounds: designers, electrical engineers, software developers, executives, investors, entrepreneurs, researchers, and artists.

The keynotes that we’ve lined up will provide an overview, and a sense of how widely impactful this idea is; they touch on designmanufacturing, urban futures, synthetic biology, governmentinnovation, and techno-archaeology (a topic we’ve explored in the Solid Podcast). And they’ll wrap up with a thought-provoking talk — with a demo — on how we experience flavor.

After lunch on Wednesday and Thursday, the program gets broad.

I’ve drawn up a handful of paths that you might consider taking as you go through Solid next week. None of these is a comprehensive program, but they’ll serve as jumping-off points for different members of the new hardware community. Read more…