- FAA to Regulate UAVs? (Forbes) — and the Executive Order will segment the privacy issues related to drones into two categories — public and private. For public drones (that is, drones purchased with federal dollars), the President’s order will establish a series of privacy and transparency guidelines. See also How ESPN is Shooting the X Games with Drones (Popular Mechanics)—it’s all fun and games until someone puts out their eye with a quadrocopter. The tough part will be keeping within the tight restrictions the FAA gave them. Because drones can’t be flown above a crowd, Calcinari says, “We basically had to build a 500-foot radius around them, where the public can’t go.” The drones will fly over sections of the course that are away from the crowds, where only ESPN production employees will be. That rule is part of why we haven’t seen drones at college football games.
- Milestones for SaaS Companies — “Getting from $0-1m is impossible. Getting from $1-10m is unlikely. And getting from $10-100m is inevitable.” —Jason Lemkin, ex-CEO of Echosign. The article proposes some significant milestones, and they ring true. Making money is generally hard. The nature of the hard changes with the amount of money you have and the amount you’re trying to make, but if it were easy, then we’d structure our society on something else.
- Woodcut Data Visualisation — Recently, I learned how to operate a laser cutter. It’s been a whole lot of fun, and I wanted to share my experiences creating woodcut data visualizations using just D3. I love it when data visualisations break out of the glass rectangle.
- Why is Concurrent Programming Hard? — on the one hand there is not a single concurrency abstraction that fits all problems, and on the other hand the various different abstractions are rarely designed to be used in combination with each other. We are due for a revolution in programming, something to help us make sense of the modern systems made of more moving parts than our feeble grey matter can model and intuit about.
January 2015 Archives
Learn how to manipulate data, and construct and evaluate models in Azure ML, using a complete data science example.
Deriving value from machine learning, however, is often impeded by complex technology deployments and long model-development cycles. Fortunately, machine learning and data science are undergoing democratization. Workflow environments make tools for building and evaluating sophisticated machine learning models accessible to a wider range of users. Cloud-based environments provide secure ubiquitous access to data storage and powerful data science tools.
To get you started creating and evaluating your own machine learning models, O’Reilly has commissioned a new report: “Data Science in the Cloud, with Azure Machine Learning and R.” We use an in-depth data science example — predicting bicycle rental demand — to show you how to perform basic data science tasks, including data management, data transformation, machine learning, and model evaluation in the Microsoft Azure Machine Learning cloud environment. Using a free-tier Azure ML account, example R scripts, and the data provided, the report provides hands-on experience with this practical data science example. Read more…
A reflection on the social impacts of smarter hardware in the physical world.
Attend Solid 2015 to explore the IoT’s impact on privacy and security.
Here’s the scenario today: I am out of milk, and my refrigerator sits there, mute and unsympathetic. Some time in the 90s, I was promised a fridge that would call the store when I was out of milk, and it would then be delivered while I, ignorant of my dearth of dairy, went about my business. Apparently such predictions were off. Someone forgot to tell my fridge manufacturer to put sensors, software, and networking gear into their products.
But there is hope. The dumb objects in the analog physical world are being slowly upgraded. From the very sexy telemetry systems in new BMWs to the very unsexy pallets of lettuce in a warehouse, Things That Heretofore Were Blind and Mute are getting eyes, ears, mouths, and in some cases, brains. This is evolution, not revolution, and while it is still slow-moving, it’s beneficial to reflect on some of the social impacts of smarter hardware in the physical world. Read more…
The O'Reilly Data Show Podcast: Carlos Guestrin on the early days of GraphLab and the evolution of GraphLab Create.
I only really started playing around with GraphLab when the companion project GraphChi came onto the scene. By then I’d heard from many avid users and admired how their user conference instantly became a popular San Francisco Bay Area data science event. For this podcast episode, I sat down with Carlos Guestrin, co-founder/CEO of Dato, a start-up launched by the creators of GraphLab. We talked about the early days of GraphLab, the evolution of GraphLab Create, and what’s he’s learned from starting a company.
MATLAB for graphs
Guestrin remains a professor of computer science at the University of Washington, and GraphLab originated when he was still a faculty member at Carnegie Mellon. GraphLab was built by avid MATLAB users who needed to do large scale graphical computations to demonstrate their research results. Guestrin shared some of the backstory:
“I was a professor at Carnegie Mellon for about eight years before I moved to Seattle. A couple of my students, Joey Gonzales and Yucheng Low were working on large scale distributed machine learning algorithms specially with things called graphical models. We tried to implement them to show off the theorems that we had proven. We tried to run those things on top of Hadoop and it was really slow. We ended up writing those algorithms on top of MPI which is a high performance computing library and it was just a pain. It took a long time and it was hard to reproduce the results and the impact it had on us is that writing papers became a pain. We wanted a system for my lab that allowed us to write more papers more quickly. That was the goal. In other words so they could implement this machine learning algorithms more easily, more quickly specifically on graph data which is what we focused on.”
Support experimentation and continuously evaluate to stay ahead.
Businesses have always come and gone, but these days it seems that companies can fall from market dominance to bankruptcy in the blink of an eye. Kodak, Blockbuster and HMV are just a few recent victims of the rapid market disruption that defines the current era.
Where did these once iconic companies go wrong? To my mind, they forgot to keep challenging their assumptions about what business they were actually in.
Businesses have two options when they plan for the road ahead: they can put all their eggs into one basket, and risk losing everything if that basket has a hole in the bottom, or they can make a number of small bets, accepting that some will fail while others succeed.
Taking the latter approach, and making many small bets on innovation, transforms the boardroom into a roulette table. Unlike a punter in a casino, however, businesses cannot afford to stop making bets.
Business models are transient and prone to disruption by changes in markets and the external competitive environment, advances in design and technology, and wider social and economic change. Organizations that misjudge their purpose, or cannot sense and then adapt to these changes, will perish.
The sad truth is that too many established organizations focus most of their time and resources on executing and optimizing their existing business models in order to maximize profits. They forget to experiment and explore new ideas for customer needs of tomorrow.
For maximum business value, big data applications have to involve multiple Hadoop ecosystem components.
Data is deluging today’s enterprise organizations from ever-expanding sources and in ever-expanding formats. To gain insight from this valuable resource, organizations have been adopting Apache Hadoop with increasing momentum. Now, the most successful players in big data enterprise are no longer only utilizing Hadoop “core” (i.e., batch processing with MapReduce), but are moving toward analyzing and solving real-world problems using the broader set of tools in an enterprise data hub (often interactively) — including components such as Impala, Apache Spark, Apache Kafka, and Search. With this new focus on workload diversity comes an increased demand for developers who are well-versed in using a variety of components across the Hadoop ecosystem.
Due to the size and variety of the data we’re dealing with today, a single use case or tool — no matter how robust — can camouflage the full, game-changing potential of Hadoop in the enterprise. Rather, developing end-to-end applications that incorporate multiple tools from the Hadoop ecosystem, not just the Hadoop core, is the first step toward activating the disparate use cases and analytic capabilities of which an enterprise data hub is capable. Whereas MapReduce code primarily leverages Java skills, developers who want to work on full-scale big data engineering projects need to be able to work with multiple tools, often simultaneously. An authentic big data applications developer can ingest and transform data using Kite SDK, write SQL queries with Impala and Hive, and create an application GUI with Hue. Read more…