January 2015 Archives

Getting started with data science in the cloud

Learn how to manipulate data, and construct and evaluate models in Azure ML, using a complete data science example.

Large-scale machine learning, or predictive analytics, is having a powerful impact across many industries. By using machine learning, companies, governments, and not-for-profits are replacing guesses and seat-of-the-pants estimates with valuable data-driven predictions.

Deriving value from machine learning, however, is often impeded by complex technology deployments and long model-development cycles. Fortunately, machine learning and data science are undergoing democratization. Workflow environments make tools for building and evaluating sophisticated machine learning models accessible to a wider range of users. Cloud-based environments provide secure ubiquitous access to data storage and powerful data science tools.

To get you started creating and evaluating your own machine learning models, O’Reilly has commissioned a new report: “Data Science in the Cloud, with Azure Machine Learning and R.” We use an in-depth data science example — predicting bicycle rental demand — to show you how to perform basic data science tasks, including data management, data transformation, machine learning, and model evaluation in the Microsoft Azure Machine Learning cloud environment. Using a free-tier Azure ML account, example R scripts, and the data provided, the report provides hands-on experience with this practical data science example. Read more…

Who should and should not be talking to your fridge?

A reflection on the social impacts of smarter hardware in the physical world.

Attend Solid 2015 to explore the IoT’s impact on privacy and security.


My_Amazing_Fridge_Antonio_Roberts_FlickrHere’s the scenario today: I am out of milk, and my refrigerator sits there, mute and unsympathetic. Some time in the 90s, I was promised a fridge that would call the store when I was out of milk, and it would then be delivered while I, ignorant of my dearth of dairy, went about my business. Apparently such predictions were off. Someone forgot to tell my fridge manufacturer to put sensors, software, and networking gear into their products.

But there is hope. The dumb objects in the analog physical world are being slowly upgraded. From the very sexy telemetry systems in new BMWs to the very unsexy pallets of lettuce in a warehouse, Things That Heretofore Were Blind and Mute are getting eyes, ears, mouths, and in some cases, brains. This is evolution, not revolution, and while it is still slow-moving, it’s beneficial to reflect on some of the social impacts of smarter hardware in the physical world. Read more…

Four short links: 30 January 2015

Four short links: 30 January 2015

FAA Rules, Sports UAVs, Woodcut Data, and Concurrent Programming

  1. FAA to Regulate UAVs? (Forbes) — and the Executive Order will segment the privacy issues related to drones into two categories — public and private. For public drones (that is, drones purchased with federal dollars), the President’s order will establish a series of privacy and transparency guidelines. See also How ESPN is Shooting the X Games with Drones (Popular Mechanics)—it’s all fun and games until someone puts out their eye with a quadrocopter. The tough part will be keeping within the tight restrictions the FAA gave them. Because drones can’t be flown above a crowd, Calcinari says, “We basically had to build a 500-foot radius around them, where the public can’t go.” The drones will fly over sections of the course that are away from the crowds, where only ESPN production employees will be. That rule is part of why we haven’t seen drones at college football games.
  2. Milestones for SaaS Companies“Getting from $0-1m is impossible. Getting from $1-10m is unlikely. And getting from $10-100m is inevitable.” —Jason Lemkin, ex-CEO of Echosign. The article proposes some significant milestones, and they ring true. Making money is generally hard. The nature of the hard changes with the amount of money you have and the amount you’re trying to make, but if it were easy, then we’d structure our society on something else.
  3. Woodcut Data VisualisationRecently, I learned how to operate a laser cutter. It’s been a whole lot of fun, and I wanted to share my experiences creating woodcut data visualizations using just D3. I love it when data visualisations break out of the glass rectangle.
  4. Why is Concurrent Programming Hard?on the one hand there is not a single concurrency abstraction that fits all problems, and on the other hand the various different abstractions are rarely designed to be used in combination with each other. We are due for a revolution in programming, something to help us make sense of the modern systems made of more moving parts than our feeble grey matter can model and intuit about.

A human-centered approach to data-driven design

The O'Reilly Radar Podcast: Arianna McClain on humanizing data-driven design, and Dirk Knemeyer on design in emerging tech.

This week on the O’Reilly Radar Podcast, O’Reilly’s Roger Magoulas talks with Arianna McClain, a senior hybrid design researcher at IDEO, about storytelling through data; the interdependent nature of qualitative and quantitative data; and the human-centered, data-driven design approach at IDEO.

Subscribe to the O’Reilly Radar Podcast

iTunes, SoundCloud, RSS

In their interview, Magoulas noted that in our research at O’Reilly, we’ve been talking a lot about the importance of the social science design element in getting the most out of data. McClain emphasized the importance of storytelling through data at IDEO and described IDEO’s human-centered approach to data-driven design:

“IDEO really believes in staying and remaining human-centered throughout the data journey. Starting off with, how might we measure something, how might we measure a behavior. We don’t sit in a room and come up with an algorithm or come up with a question. We start by talking to people. … We’re trying to build measures and survey questions to understand at scale how people make decisions. … IDEO remains data-driven to how we analyze and synthesize our findings. When we’re given a large data set, we don’t analyze it and write a report and give it to people and say, ‘This is the direction we think you should go.’

“Instead, we look at segmentations in the data, and stories in the data, and how the data clusters. Then we go back, and we try to find people who are representative of that cluster or that segmentation. The segmentations, again, are not based on demographic variables. They are based on needs and insights that we heard in our qualitative research. … What we’ve recognized is that something that seems so clear in the analysis is often very nuanced, and it can inform our design.”

Read more…

The evolution of GraphLab

The O'Reilly Data Show Podcast: Carlos Guestrin on the early days of GraphLab and the evolution of GraphLab Create.

I only really started playing around with GraphLab when the companion project GraphChi came onto the scene. By then I’d heard from many avid users and admired how their user conference instantly became a popular San Francisco Bay Area data science event. For this podcast episode, I sat down with Carlos Guestrin, co-founder/CEO of Dato, a start-up launched by the creators of GraphLab. We talked about the early days of GraphLab, the evolution of GraphLab Create, and what’s he’s learned from starting a company.

MATLAB for graphs

Guestrin remains a professor of computer science at the University of Washington, and GraphLab originated when he was still a faculty member at Carnegie Mellon. GraphLab was built by avid MATLAB users who needed to do large scale graphical computations to demonstrate their research results. Guestrin shared some of the backstory:

“I was a professor at Carnegie Mellon for about eight years before I moved to Seattle. A couple of my students, Joey Gonzales and Yucheng Low were working on large scale distributed machine learning algorithms specially with things called graphical models. We tried to implement them to show off the theorems that we had proven. We tried to run those things on top of Hadoop and it was really slow. We ended up writing those algorithms on top of MPI which is a high performance computing library and it was just a pain. It took a long time and it was hard to reproduce the results and the impact it had on us is that writing papers became a pain. We wanted a system for my lab that allowed us to write more papers more quickly. That was the goal. In other words so they could implement this machine learning algorithms more easily, more quickly specifically on graph data which is what we focused on.”

Read more…

Four short links: 29 January 2015

Four short links: 29 January 2015

Security Videos, Network Simulation, UX Book, and Profit in Perspective

  1. ShmooCon 2015 Videos — videos to security talks from ShmooCon 2015.
  2. Comcast (Github) — Comcast is a tool designed to simulate common network problems like latency, bandwidth restrictions, and dropped/reordered/corrupted packets. On BSD-derived systems such as OSX, we use tools like ipfw and pfctl to inject failure. On Linux, we use iptables and tc. Comcast is merely a thin wrapper around these controls.
  3. The UX ReaderThis ebook is a collection of the most popular articles from our [MailChimp] UX Newsletter, along with some exclusive content.
  4. Bad AssumptionsApple lost more money to currency fluctuations than Google makes in a quarter.
Four short links: 28 January 2015

Four short links: 28 January 2015

Note and Vote, Gaming Behaviour, Code Search, and Immutabilate All The Things

  1. Note and Vote (Google Ventures) — nifty meeting hack to surface ideas and identify popular candidates to a decision maker.
  2. Applying Psychology to Improve Online Behaviour — online game runs massive experiments (w/researchers to validate findings) to improve the behaviour of their players. Some of Riot’s experiments are causing the game to evolve. For example, one product is a restricted chat mode that limits the number of messages abusive players can type per match. It’s a temporary punishment that has led to a noticeable improvement in player behavior afterward —on average, individuals who went through a period of restricted chat saw 20 percent fewer abuse reports filed by other players. The restricted chat approach also proved 4 percent more effective at improving player behavior than the usual punishment method of temporarily banning toxic players. Even the smallest improvements in player behavior can make a huge difference in an online game that attracts 67 million players every month.
  3. Hound — open source code search tool from Etsy.
  4. Immutability Changes Everything (PDF) — This paper is simply an amuse-bouche on the repeated patterns of computing that leverage immutability.

Avoid disruption through exploration

Support experimentation and continuously evaluate to stay ahead.

Atomic_Laboratory_Experiment_on_Atomic_Materials_-_GPN-2000-000663_crop

Businesses have always come and gone, but these days it seems that companies can fall from market dominance to bankruptcy in the blink of an eye. Kodak, Blockbuster and HMV are just a few recent victims of the rapid market disruption that defines the current era.

Where did these once iconic companies go wrong? To my mind, they forgot to keep challenging their assumptions about what business they were actually in.

Businesses have two options when they plan for the road ahead: they can put all their eggs into one basket, and risk losing everything if that basket has a hole in the bottom, or they can make a number of small bets, accepting that some will fail while others succeed.

Taking the latter approach, and making many small bets on innovation, transforms the boardroom into a roulette table. Unlike a punter in a casino, however, businesses cannot afford to stop making bets.

Business models are transient and prone to disruption by changes in markets and the external competitive environment, advances in design and technology, and wider social and economic change. Organizations that misjudge their purpose, or cannot sense and then adapt to these changes, will perish.

The sad truth is that too many established organizations focus most of their time and resources on executing and optimizing their existing business models in order to maximize profits. They forget to experiment and explore new ideas for customer needs of tomorrow.

Read more…

Four short links: 27 January 2015

Four short links: 27 January 2015

Autonomous Corporations, Abstract Thought, Down Rounds, and Distributed Messaging

  1. Decentralised Autonomous Corporations — Charlie Stross’s near-future fiction of Accelerando comes closer to reality: Malice – revenge for waking him up – sharpens Manfred’s voice. “The president of agalmic.holdings.root.184.97.AB5 is agalmic.holdings.root.184.97.201. The secretary is agalmic.holdings.root.184.D5, and the chair is agalmic.holdings.root.184.E8.FF. All the shares are owned by those companies in equal measure, and I can tell you that their regulations are written in Python. Have a nice day, now!” He thumps the bedside phone control and sits up, yawning, then pushes the do-not-disturb button before it can interrupt again. After a moment he stands up and stretches, then heads to the bathroom to brush his teeth, comb his hair, and figure out where the lawsuit originated and how a human being managed to get far enough through his web of robot companies to bug him.
  2. Coding is Not the New Literacy (Chris Grainger) — We build mental models of everything – from how to tie our shoes to the way macro-economic systems work. With these, we make decisions, predictions, and understand our experiences. If we want computers to be able to compute for us, then we have to accurately extract these models from our heads and record them. Writing Python isn’t the fundamental skill we need to teach people. Modeling systems is. Amen!
  3. Let’s Stop Laughing at Groupon (Fortune) — it is much easier to survive a valuation decline as a public company than as a private one.
  4. nsq — Bitly’s open sourced realtime distributed messaging platform.

It’s not just about Hadoop core anymore

For maximum business value, big data applications have to involve multiple Hadoop ecosystem components.

Data is deluging today’s enterprise organizations from ever-expanding sources and in ever-expanding formats. To gain insight from this valuable resource, organizations have been adopting Apache Hadoop with increasing momentum. Now, the most successful players in big data enterprise are no longer only utilizing Hadoop “core” (i.e., batch processing with MapReduce), but are moving toward analyzing and solving real-world problems using the broader set of tools in an enterprise data hub (often interactively) — including components such as Impala, Apache Spark, Apache Kafka, and Search. With this new focus on workload diversity comes an increased demand for developers who are well-versed in using a variety of components across the Hadoop ecosystem.

Due to the size and variety of the data we’re dealing with today, a single use case or tool — no matter how robust — can camouflage the full, game-changing potential of Hadoop in the enterprise. Rather, developing end-to-end applications that incorporate multiple tools from the Hadoop ecosystem, not just the Hadoop core, is the first step toward activating the disparate use cases and analytic capabilities of which an enterprise data hub is capable. Whereas MapReduce code primarily leverages Java skills, developers who want to work on full-scale big data engineering projects need to be able to work with multiple tools, often simultaneously. An authentic big data applications developer can ingest and transform data using Kite SDK, write SQL queries with Impala and Hive, and create an application GUI with Hue. Read more…