"machine learning" entries

We make the software, you make the robots

An interview with Andreas Mueller, on scikit-learn and usable machine learning software.

Get notified when our free report “Evaluating Machine Learning Models: A beginner’s guide to key concepts and pitfalls,” by Alice Zheng, is available for download.


Superpixels example from Andreas Mueller’s thesis paper (PDF), used with permission.

A few weeks ago, I had the pleasure of sitting down (virtually, over Skype) with Andreas Mueller, core developer and maintainer of the popular scikit-learn machine learning library. We had previously bonded over our shared goals of making useful machine learning software, so I jumped at the chance to interview him.

Mueller wears many hats at work. He is one of the key maintainers of the popular Python machine learning library scikit-learn. Holding a doctorate in computer vision from the University of Bonn in Germany, he currently works on open science at New York University’s Center for Data Science. He speaks at conferences around the world and has a fanbase of 5,000+ followers on Twitter and about as many reputation points on Stack Overflow. In other words, this man has got mad street cred. He started out doing pure math in academia, and has now achieved software developer cult idol status. Read more…

Four short links: 4 August 2015

Four short links: 4 August 2015

Data-Flow Graphing, Realtime Predictions, Robot Hotel, and Open-Source RE

  1. Data-flow Graphing in Python (Matt Keeter) — not shared because data-flow graphing is sexy new hot topic that’s gonna set the world on fire (though, I bet that’d make Matt’s day), but because there are entire categories of engineering and operations migraines that are caused by not knowing where your data came from or goes to, when, how, and why. Remember Wirth’s “algorithms + data structures = programs”? Data flows seem like a different slice of “programs.” Perhaps “data flow + typos = programs”?
  2. Machine Learning for Sports and Real-time Predictions (Robohub) — podcast interview for your commute. Real time is gold.
  3. Japan’s Robot Hotel is Serious Business (Engadget) — hotel was architected to suit robots: For the porter robots, we designed the hotel to include wide paths.” Two paths slope around the hotel lobby: one inches up to the second floor, while another follows a gentle decline to guide first-floor guests (slowly, but with their baggage) all the way to their room. Makes sense: at Solid, I spoke to a chap working on robots for existing hotels, and there’s an entire engineering challenge in navigating an elevator that you wouldn’t believe.
  4. bokken — GUI to help open source reverse engineering for code.
Four short links: 31 July 2015

Four short links: 31 July 2015

Robot Swarms, Google Datacenters, VR Ecosystem, and DeepDream Visualised

  1. Buzz: An Extensible Programming Language for Self-Organizing Heterogeneous Robot Swarms (arXiv) — Swarm-based primitives allow for the dynamic management of robot teams, and for sharing information globally across the swarm. Self-organization stems from the completely decentralized mechanisms upon which the Buzz run-time platform is based. The language can be extended to add new primitives (thus supporting heterogeneous robot swarms), and its run-time platform is designed to be laid on top of other frameworks, such as Robot Operating System.
  2. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network (PDF) — Our datacenter networks run at dozens of sites across the planet, scaling in capacity by 100x over 10 years to more than 1Pbps of bisection bandwidth. Wow, their Wi-Fi must be AMAZING!
  3. Nokia’s VR Ambitions Could Restore Its Tech Lustre (Bloomberg) — the VR ecosystem map is super-interesting.
  4. Visualising GoogleNet Classes — fascinating to see squirrel monkeys and basset hounds emerge from nothing. It’s so tempting to say, “this is what the machine sees in its mind when it thinks of basset hounds,” even though Boring Brain says, “that’s bollocks and you know it!”
Four short links: 30 July 2015

Four short links: 30 July 2015

Catalogue Data, Git for Data, Computer-Generated Handwriting, and Consumer Robots

  1. A Sort of Joy — MOMA’s catalogue was released under CC license, and has even been used to create new art. The performance is probably NSFW at your work without headphones on, but is hilarious. Which I never thought I’d say about a derivative work of a museum catalogue. (via Courtney Johnston)
  2. dat goes beta — the “git for data” goes beta. (via Nelson Minar)
  3. Computer Generated Handwriting — play with it here. (via Evil Mad Scientist Labs)
  4. Japanese Telcos vie for Consumer Robot-as-a-Service Business (Robohub) — NTT says Sota will be deployed in seniors’ homes as early as next March, and can be connected to medical devices to help monitor health conditions. This plays well with Japanese policy to develop and promote technological solutions to its aging population crisis.

Understanding neural function and virtual reality

The O'Reilly Data Show Podcast: Poppy Crum explains that what matters is efficiency in identifying and emphasizing relevant data.


Like many data scientists, I’m excited about advances in large-scale machine learning, particularly recent success stories in computer vision and speech recognition. But I’m also cognizant of the fact that press coverage tends to inflate what current systems can do, and their similarities to how the brain works.

During the latest episode of the O’Reilly Data Show Podcast, I had a chance to speak with Poppy Crum, a neuroscientist who gave a well-received keynote at Strata + Hadoop World in San Jose. She leads a research group at Dolby Labs and teaches a popular course at Stanford on Neuroplasticity in Musical Gaming. I wanted to get her take on AI and virtual reality systems, and hear about her experience building a team of researchers from diverse disciplines.

Understanding neural function

While it can sometimes be nice to mimic nature, in the case of the brain, machine learning researchers recognize that understanding and identifying the essential neural processes is much more critical. A related example cited by machine learning researchers is flight: wing flapping and feathers aren’t critical, but an understanding of physics and aerodynamics is essential.

Crum and other neuroscience researchers express the same sentiment. She points out that a more meaningful goal should be to “extract and integrate relevant neural processing strategies when applicable, but also identify where there may be opportunities to be more efficient.”

The goal in technology shouldn’t be to build algorithms that mimic neural function. Rather, it’s to understand neural function. … The brain is basically, in many cases, a Rube Goldberg machine. We’ve got this limited set of evolutionary building blocks that we are able to use to get to a sort of very complex end state. We need to be able to extract when that’s relevant and integrate relevant neural processing strategies when it’s applicable. We also want to be able to identify that there are opportunities to be more efficient and more relevant. I think of it as table manners. You have to know all the rules before you can break them. That’s the big difference between being really cool or being a complete heathen. The same thing kind of exists in this area. How we get to the end state, we may be able to compromise, but we absolutely need to be thinking about what matters in neural function for perception. From my world, where we can’t compromise is on the output. I really feel like we need a lot more work in this area. Read more…

Comments: 2
Four short links: 27 July 2015

Four short links: 27 July 2015

Google’s Borg, Georgia v. Malamud, SLAM-aware system, and SmartGPA

  1. Large-scale Cluster Management at Google with BorgGoogle’s Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters, each with up to tens of thousands of machines. […] We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.
  2. Georgia Sues Carl Malamud (TechDirt) — for copyright infringement… for publishing an official annotated copy of the state's laws. […] the state points directly to the annotated version as the official laws of the state.
  3. Monocular SLAM Supported Object Recognition (PDF) — a monocular SLAM-aware object recognition system that is able to achieve considerably stronger recognition performance, as compared to classical object recognition systems that function on a frame-by-frame basis. (via Improving Object Recognition for Robots)
  4. SmartGPA: How Smartphones Can Assess and Predict Academic Performance of College Students (PDF) — We show that there are a number of important behavioral factors automatically inferred from smartphones that significantly correlate with term and cumulative GPA, including time series analysis of activity, conversational interaction, mobility, class attendance, studying, and partying.
Comment: 1

Data has a shape

Using topology to uncover the shape of your data: An interview with Gurjeet Singh.


Get notified when our free report, “Future of Machine Intelligence: Perspectives from Leading Practitioners,” is available for download. The following interview is one of many that will be included in the report.

As part of our ongoing series of interviews surveying the frontiers of machine intelligence, I recently interviewed Gurjeet Singh. Singh is CEO and co-founder of Ayasdi, a company that leverages machine intelligence software to automate and accelerate discovery of data insights. Author of numerous patents and publications in top mathematics and computer science journals, Singh has developed key mathematical and machine learning algorithms for topological data analysis.

Key Takeaways

  • The field of topology studies the mapping of one space into another through continuous deformations.
  • Machine learning algorithms produce functional mappings from an input space to an output space and lend themselves to be understood using the formalisms of topology.
  • A topological approach allows you to study data sets without assuming a shape beforehand and to combine various machine learning techniques while maintaining guarantees about the underlying shape of the data.

David Beyer: Let’s get started by talking about your background and how you got to where you are today.

Gurjeet Singh: I am a mathematician and a computer scientist, originally from India. I got my start in the field at Texas Instruments, building integrated software and performing digital design. While at TI, I got to work on a project using clusters of specialized chips called Digital Signal Processors (DSPs) to solve computationally hard math problems.

As an engineer by training, I had a visceral fear of advanced math. I didn’t want to be found out as a fake, so I enrolled in the Computational Math program at Stanford. There, I was able to apply some of my DSP work to solving partial differential equations and demonstrate that a fluid dynamics researcher need not buy a supercomputer anymore; they could just employ a cluster of DSPs to run the system. I then spent some time in mechanical engineering building similar GPU-based partial differential equation solvers for mechanical systems. Finally, I worked in Andrew Ng’s lab at Stanford, building a quadruped robot and programming it to learn to walk by itself. Read more…

Comments: 2
Four short links: 15 July 2015

Four short links: 15 July 2015

OpeNSAurce, Multimaterial Printing, Functional Javascript, and Outlier Detection

  1. System Integrity Management Platform (Github) — NSA releases security compliance tool for government departments.
  2. 3D-Printed Explosive Jumping Robot Combines Firm and Squishy Parts (IEEE Spectrum) — Different parts of the robot grade over three orders of magnitude from stiff like plastic to squishy like rubber, through the use of nine different layers of 3D printed materials.
  3. Professor Frisby’s Mostly Adequate Guide to Functional Programming — a book on functional programming, using Javascript as the programming language.
  4. Tracking Down Villains — the software and algorithms that Netflix uses to detect outliers in their infrastructure monitoring.
Four short links: 8 July 2015

Four short links: 8 July 2015

Encrypted Databases, Product Management, Patenting Machine Learning, and Programming Ethics

  1. Zero Knowledge and Homomorphic Encryption (ZDNet) — coverage of a few startups working on providing databases that don’t need to decrypt the data they store and retrieve.
  2. How Not to Suck at Making ProductsNever confuse “category you’re in” with the “value you deliver.” Customers only care about the latter.
  3. Google Patenting Machine Learning Developments (Reddit) — I am afraid that Google has just started an arms race, which could do significant damage to academic research in machine learning. Now it’s likely that other companies using machine learning will rush to patent every research idea that was developed in part by their employees. We have all been in a prisoner’s dilemma situation, and Google just defected. Now researchers will guard their ideas much more combatively, given that it’s now fair game to patent these ideas, and big money is at stake.
  4. Machine Ethics (Nature) — machine learning ethics versus rule-driven ethics. Logic is the ideal choice for encoding machine ethics, argues Luís Moniz Pereira, a computer scientist at the Nova Laboratory for Computer Science and Informatics in Lisbon. “Logic is how we reason and come up with our ethical choices,” he says. I disagree with his premises.
Four short links: 6 July 2015

Four short links: 6 July 2015

DeepDream, In-Flight WiFi, Computer Vision in Preservation, and Testing Distributed Systems

  1. DeepDream — the software that’s been giving the Internet acid-free trips.
  2. In-Flight WiFi Business — numbers and context for why some airlines (JetBlue) have fast free in-flight wifi while others (Delta) have pricey slow in-flight wifi. Four years ago ViaSat-1 went into geostationary orbit, putting all other broadband satellites to shame with 140 Gbps of total capacity. This is the Ka-band satellite that JetBlue’s fleet connects to, and while the airline has to share that bandwidth with homes across of North America that subscribe to ViaSat’s Excede residential broadband service, it faces no shortage of capacity. That’s why JetBlue is able to deliver 10-15 Mbps speeds to its passengers.
  3. British Library Digitising Newspapers (The Guardian) — as well as photogrammetry methods used in the Great Parchment Book project, Terras and colleagues are exploring the potential of a host of techniques, including multispectral imaging (MSI). Inks, pencil marks, and paper all reflect, absorb, or emit particular wavelengths of light, ranging from the infrared end of the electromagnetic spectrum, through the visible region and into the UV. By taking photographs using different light sources and filters, it is possible to generate a suite of images. “We get back this stack of about 40 images of the [document] and then we can use image-processing to try to see what is in [some of them] and not others,” Terras explains.
  4. Testing a Distributed System (ACM) — This article discusses general strategies for testing distributed systems as well as specific strategies for testing distributed data storage systems.