Robotic Garden, Kids Toys, MSFT ML, and Twitter Scale

  1. The Distributed Robotic Garden (MIT) — We consider plants, pots, and robots to be systems with different levels of mobility, sensing, actuation, and autonomy. (via Robohub)
  2. CogniToys Leverages Watson’s Brain to Befriend, Teach Your Kids (IEEE) — Through the dino, Watson’s algorithms can get to know each child that it interacts with, tailoring those interactions to the child’s age and interests.
  3. How Machine Learning Ate Microsoft (Infoworld) — Azure ML didn’t merely take the machine learning algorithms MSR had already handed over to product teams and stick them into a drag-and-drop visual designer. Microsoft has made the functionality available to developers who know the R statistical programming language and Python, which together are widely used in academic machine learning. Microsoft plans to integrate Azure ML closely with Revolution Analytics, the R startup it recently acquired.
  4. Handling Five Billion Sessions a Day in Real Time (Twitter) — infrastructure porn.
Exploring methods in active learning

Tips on how to build effective human-machine hybrids, from crowdsourcing expert Adam Marcus.

15146_ORM_Webcast_ad(archived)In a recent O’Reilly webcast, “Crowdsourcing at GoDaddy: How I Learned to Stop Worrying and Love the Crowd,” Adam Marcus explains how to mitigate common challenges of managing crowd workers, how to make the most of human-in-the-loop machine learning, and how to establish effective and mutually rewarding relationships with workers. Marcus is the director of data on the Locu team at GoDaddy, where the “Get Found” service provides businesses with a central platform for managing their online presence and content.

In the webcast, Marcus uses practical examples from his experience at GoDaddy to reveal helpful methods for how to:

  • Offset the inevitability of wrong answers from the crowd
  • Develop and train workers through a peer-review system
  • Build a hierarchy of trusted workers
  • Make crowd work inspiring and enable upward mobility

What to do when humans get it wrong

It turns out there is a simple way to offset human error: redundantly ask people the same questions. Marcus explains that when you ask five different people the same question, there are some creative ways to combine their responses, and use a majority vote. Read more…

Speech Recognition, Predictive Analytic Queries, Video Chat, and Javascript UI Library

  1. The Uncanny Valley of Speech Recognition (Zach Holman) — I’m reminded of driving up US-280 in 2003 or so with @raelity, a Kiwi and a South African trying every permutation of American accent from Kentucky to Yosemite Sam in order to get TellMe to stop giving us the weather for zipcode 10000. It didn’t recognise the swearing either. (Caution: features similarly strong language.)
  2. TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries (PDF) — an integrated PAQ [Predictive Analytic Queries] planning architecture that combines advanced model search techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching. The resulting system, TUPAQ, solves the PAQ planning problem with comparable accuracy to exhaustive strategies but an order of magnitude faster, and can scale to models trained on terabytes of data across hundreds of machines.
  3. p2pvc — point-to-point video chat. In an 80×25 terminal window.
  4. Sortable — nifty UI library.
Active Learning, Tongue Sensors, Cybernetic Management, and HTML5 Game Publishing

  1. Real World Active Learningthe point at which algorithms fail is precisely where there’s an opportunity to insert human judgment to actively improve the algorithm’s performance. An O’Reilly report with CrowdFlower.
  2. Hearing With Your Tongue (BoingBoing) — The tongue contains thousands of nerves, and the region of the brain that interprets touch sensations from the tongue is capable of decoding complicated information. “What we are trying to do is another form of sensory substitution,” Williams said.
  3. The Art of Management — cybernetics and management.
  4. kiwi.jsa mobile & desktop browser based HTML5 game framework. It uses CocoonJS for publishing to the AppStore.
Human-in-the-loop machine learning

Practical machine-learning applications and strategies from experts in active learning.

What do you call a practice that most data scientists have heard of, few have tried, and even fewer know how to do well? It turns out, no one is quite certain what to call it. In our latest free report Real-World Active Learning: Applications and Strategies for Human-in-the-Loop Machine Learning, we examine the relatively new field of “active learning” — also referred to as “human computation,” “human-machine hybrid systems,” and “human-in-the-loop machine learning.” Whatever you call it, the field is exploding with practical applications that are proving the efficiency of combining human and machine intelligence.

Learn from the experts

Through in-depth interviews with experts in the field of active learning and crowdsource management, industry analyst Ted Cuzzillo reveals top tips and strategies for using short-term human intervention to actively improve machine models. As you’ll discover, the point at which a machine model fails is precisely where there’s an opportunity to insert — and benefit from — human judgment.

Find out:

  • When active learning works best
  • How to manage crowdsource contributors (including expert-level contributors)
  • Basic principles of labeling data
  • Best practice methods for assessing labels
  • When to skip the crowd and mine your own data

Explore real-world examples

This report gives you a behind-the-scenes look at how human-in-the-loop machine learning has helped improve the accuracy of Google Maps, match business listings at GoDaddy, rank top search results at Yahoo!, refer relevant job postings to people on LinkedIn, identify expert-level contributors using the Quizz recruitment method, and recommend women’s clothing based on customer and product data at Stitch Fix. Read more…

Weather Forecasting, Better Topic Modelling, Cyberdefense, and Facebook Warriors

  1. Global Forecast System — National Weather Service open sources its weather forecasting software. Hope you have a supercomputer and all the data to make use of it …
  2. High-reproducibility and high-accuracy method for automated topic classificationLatent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
  3. Army Open Sources Cyberdefense Codegit push is the new “for immediate release”.
  4. British Army Creates Team of Facebook Warriors (The Guardian) — no matter how much I know the arguments for it, it still feels vile.
Going Offline, AI Ethics, Human Risks, and Deep Learning

  1. Reset (Rowan Simpson) — It was a bit chilling to go back over a whole years worth of tweets and discover how many of them were just junk. Visiting the water cooler is fine, but somebody who spends all day there has no right to talk of being full.
  2. Google’s AI Brain — on the subject of Google’s AI ethics committee … Q: Will you eventually release the names? A: Potentially. That’s something also to be discussed. Q: Transparency is important in this too. A: Sure, sure. Such reassuring.
  3. AVA is now Open Source (Laura Bell) — Assessment, Visualization and Analysis of human organisational information security risk. AVA maps the realities of your organisation, its structures and behaviors. This map of people and interconnected entities can then be tested using a unique suite of customisable, on-demand, and scheduled information security awareness tests.
  4. Deep Learning for Torch (Facebook) — Facebook AI Research open sources faster deep learning modules for Torch, a scientific computing framework with wide support for machine learning algorithms.
Designed-In Outrage, Continuous Data Processing, Lisp Processors, and Anomaly Detection

  1. The Toxoplasma of RageIt’s in activists’ interests to destroy their own causes by focusing on the most controversial cases and principles, the ones that muddy the waters and make people oppose them out of spite. And it’s in the media’s interest to help them and egg them on.
  2. Samza: LinkedIn’s Stream-Processing EngineSamza’s goal is to provide a lightweight framework for continuous data processing. Unlike batch processing systems such as Hadoop, which typically has high-latency responses (sometimes hours), Samza continuously computes results as data arrives, which makes sub-second response times possible.
  3. Design of LISP-Based Processors (PDF) — 1979 MIT AI Lab memo on design of hardware specifically for Lisp. Legendary subtitle! LAMBDA: The Ultimate Opcode.
  4. rAnomalyDetection — Twitter’s R package for detecting anomalies in time-series data. (via Twitter Engineering blog)
Complex Addresses, AI Applications, Scaling Diversity, Audiovisual Coding

  1. Falsehoods Programmers Believe About Addresses0 Egmont Road, Middlesbrough. lolwut?
  2. Future of the AI-Powered Application (Matt Turck) — we’re about to witness the emergence of a number of deeply focused AI-powered applications that will achieve commercial success by solving in a definitive manner very specific issues. (via Matt Webb)
  3. Three Things a City In Charge of its Destiny Ought to Know About Software (Matt Edgar) — Instead of asking “will it scale”, ask a better question: “Does it gracefully handle massive diversity?” […] The diversity question accommodates scaling; the scaling question tramples all over diversity. (via Tom Armitage)
  4. gibbera creative coding environment for audiovisual performance and composition. It contains features for audio synthesis and musical sequencing, 2d drawing, 3d scene construction and manipulation, and live-coding shaders. If you’re looking for more ways to interest teens in code …
IoT Protocols, Predictive Limits, Machine Learning and Security, and 3D-Printing Electronics

  1. Exploring the Protocols of the Internet of Things (Sparkfun) — Arduino and Arduino-like IoT “things” especially, with their limited flash and SRAM, can benefit from specially crafted IoT protocols.
  2. Complexity Salon: Ebola (willowbl00) — These notes were taken at the 2014.Dec.18 New England Complex Systems Institute Salon focused on Ebola. […] Why don’t we engage in risks in a more serious way? Everyone thinks their prior experience indicates what will happen in the future. Look at past Ebola! It died down before going far, surely it won’t be bad in the future.
  3. Machine Learning Methods for Computer Security (PDF) — papers on topics such as adversarial machine learning, attacking pattern recognition systems, data privacy and machine learning, machine learning in forensics, and deceiving authorship detection.
  4. voxel8Using Voxel8’s 3D printer, you can co-print matrix materials such as thermoplastics and highly conductive silver inks enabling customized electronic devices like quadcopters, electromagnets and fully functional 3D electromechanical assemblies.