# "machine learning" entries

## Four short links: 12 October 2015

### Unattended Robots, Replicable Economics, Deep Learning Learnings, and TPP Problems

1. Acquiring Object Experiences at Scale — software to let a robot examine a pile of objects, unattended overnight.
2. Economics Apparently Not Replicable (PDF) — We successfully replicate the key qualitative result of 22 of 67 papers (33%) without contacting the authors. Excluding the six papers that use confidential data and the two papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable.
3. 26 Things I Learned in the Deep Learning Summer School20. When Frederick Jelinek and his team at IBM submitted one of the first papers on statistical machine translation to COLING in 1988, they got the following anonymous review: The validity of a statistical (information theoretic) approach to MT has indeed been recognized, as the authors mention, by Weaver as early as 1949. And was universally recognized as mistaken by 1950 (cf. Hutchins, MT – Past, Present, Future, Ellis Horwood, 1986, p. 30ff and references therein). The crude force of computers is not science. The paper is simply beyond the scope of COLING.
4. The Final Leaked TPP Text is All That We Feared (EFF) — If you dig deeper, you’ll notice that all of the provisions that recognize the rights of the public are non-binding, whereas almost everything that benefits rightsholders is binding.

## Movement data is going to transform everything

### The O'Reilly Radar Podcast: Rajiv Maheswaran on the science of moving dots, and Claudia Perlich on big data in advertising.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

In this week’s Radar Podcast episode, O’Reilly’s Mac Slocum chats with Rajiv Maheswaran, CEO of Second Spectrum. Maheswaran talks about machine learning applications in sports, the importance of context in measuring stats, and the future of real-time, in-game analytics.

Here are some highlights from their chat:

There’s a lot of parts of the game of basketball — pick and rolls, dribble hand-offs — that coaches really care about, about analyzing how it works on offense, how to guard them. Before big data and machine learning, people basically watched the games and marked them. It turns out that people are pretty bad at marking them accurately, and they also miss a ton of stuff. Right now, machine learning tells coaches, ‘This is how many pick and rolls these two players have had over the course of the season, how often they do all the different variations, what they’re good at, what they’re bad at.’ Coaches can really find tendencies that can help them play offense, play defense, far more efficiently, based off of machine learning.

What we’re doing is having the machine match human intuition. If I’m watching a game, I know that the shot is harder if I’m farther away, if I have multiple defenders, if they’re close, if they’re closing in on me, if I’m dribbling, the type of shot I’m taking. As a human, I watch this and I have an intuition about it. Now, by giving all that data to the machine, it can make a predictor that actually matches our intuition, and goes beyond it because it can put a number onto what our intuition tells us.

## Four short links: 6 October 2015

### System Intuition, Magic is Power, Predicting Behaviour, Payment Required

1. Flux: New Approach to System Intuition (LinkedIn) — In general, we assume that if anything is best represented numerically, then we don’t need to visualize it. If the best representation is a numerical one, then a visualization could only obscure a quantifiable piece of information that can be measured, compared, and acted upon. Anything that we can wrap in alerts or some threshold boundary should kick off some automated process. No point in ruining a perfectly good system by introducing a human into the mix. Instead of numerical information, we want a tool that surfaces relevant information to a human, for situations that would be too onerous to create a heuristic. These situations require an intuition that we can’t codify.
2. Jumping to the End: Practical Design Fiction (Vimeo) — “Magic is a power relationship” — Matt Jones on the flipside of hiding complex behaviours from users and making stuff “work like magic.” (via Richard Pope)
3. Predicting Daily Activities from Egocentric Images Using Deep Learning — aka “people wear cameras and we can figure out what they’re going to do next.”
4. 402: Payment Required (David Humphrey) — The ad blocking discussion highlights our total lack of imagination, where a browser’s role is reduced to “render” or “don’t render.” There are a whole world of options in between that we should be exploring.

## Four short links: 24 September 2015

### Machine Music Learning, Cyber War, Backing Out Ads, and COBOL OF THE 2020s

1. The Hit Charade (MIT TR) — Spotify’s deep-learning system still has to be trained using millions of example songs, and it would be perplexed by a bold new style of music. What’s more, such algorithms cannot arrange songs in a creative way. Nor can they distinguish between a truly original piece and yet another me-too imitation of a popular sound. Johnson acknowledges this limitation, and he says human expertise will remain a key part of Spotify’s algorithms for the foreseeable future.
2. The Future of War is the Distant Past (John Birmingham) — the Naval Academy is hedging against the future by creating cybersecurity midshipmen, and by requiring every midshipman to learn how to do celestial navigation.
3. What Happens Next Will Amaze You (Maciej Ceglowski) — the next in Maciej’s amazing series of keynotes, where he’s building a convincing case for fixing the Web.
4. Go Will Dominate the Next Decade (Ian Eyberg) — COBOL OF THE 2020s. There, I saved you the trouble.

## Four short links: 22 September 2015

### Ant Algorithms, Git Commit, NASA's Deep Learning, and Built-In Empathy

1. Ant Algorithms for Discrete Optimization (Adrian Colyer) — Stigmergy is the generic term for the stimulation of workers by the performance they have achieved – for example, termite nest-building works in a similar way. Stigmergy is a form of indirect communication “mediated by physical modifications of environmental states which are only locally accessible to the communicating agents.
2. How to Write a Git Commit Message (Chris Beams) — A diff will tell you what changed, but only the commit message can properly tell you why.
3. Deep Belief Networks at the Heart of NASA Image ClassificationThe two new labeled satellite data sets were put to the test with a modified deep-belief-networks-driven approach, ultimately. The results show classification accuracy of 97.95%, which performed better than the unmodified pure deep belief networks, convolutional neural networks, and stacked de-noising auto-encoders by around 11%.
4. The Consequences of An Insightful Algorithm (Carina C. Zona) — We design software for humans. Balancing human needs and business specs can be tough. It’s crucial that we learn how to build in systematic empathy. (via Rowan Crawford)

## Four short links: 21 September 2015

### 2-D Single-Stroke Recognizer, Autonomous Vehicle Permits, s3concurrent, and Surviving the Music Industry

1. $1 Unistroke Recognizera 2-D single-stroke recognizer designed for rapid prototyping of gesture-based user interfaces. In machine learning terms,$1 is an instance-based nearest-neighbor classifier with a Euclidean scoring function — i.e., a geometric template matcher.
2. Apple Talking to California Officials about Self-Driving Car (Guardian) — California DMV’s main responsibility for autonomous vehicles at present is administering an autonomous vehicle tester program for experimental self-driving cars on California’s roads. So far, 10 companies have been issued permits for about 80 autonomous vehicles and more than 300 test drivers. The most recent, Honda and BMW, received their permits last week.
3. s3concurrent — sync local file structure with s3, in parallel. (via Winston Chen)
4. Amanda Palmer on Music Industry Survival Techniques (O’Reilly Radar) — I’ve always approached every Internet platform and every Internet tool with the suspicion that it may not last, and that actually what’s very important is […] the art and the relationships I’m building.

## Build better machine learning models

### A beginner's guide to evaluating your machine learning models.

Everything today is being quantified, measured, and tracked — everything is generating data, and data is powerful. Businesses are using data in a variety of ways to improve customer satisfaction. For instance, data scientists are building machine learning models to generate intelligent recommendations to users so that they spend more time on a site. Analysts can use churn analysis to predict which customers are the best targets for the next promotional campaign. The possibilities are endless.

However, there are challenges in the machine learning pipeline. Typically, you build a machine learning model on top of your data. You collect more data. You build another model. But how do you know when to stop?

## When is your smart model smart enough?

Evaluation is a key step when building intelligent business applications with machine learning. It is not a one-time task, but must be integrated with the whole pipeline of developing and productionizing machine learning-enabled applications.

In a new free O’Reilly report Evaluating Machine Learning Models: A Beginner’s Guide to Key Concepts and Pitfalls, we cut through the technical jargon of machine learning, and elucidate, in simple language, the processes of evaluating machine learning models. Read more…

## Four short links: 16 September 2015

### Data Pipelines, Amazon Culture, Real-time NFL Data, and Deep Learning for Chess

1. Three Best Practices for Building Successful Data Pipelines (Michael Li) — three key areas that are often overlooked in data pipelines, and those are making your analysis: reproducible, consistent, and productionizable.
2. Amazon’s Culture Controversy Decoded (Rita J King) — very interesting culture map analysis of the reports of Amazon’s culture, and context for how companies make choices about what to be. (via Mike Loukides)
3. How Will Real-Time Tracking Change the NFL? (New Yorker) — At the moment, the NFL is being tightfisted with the data. Commentators will have access during games, as will the betting and analytics firm Sportradar. Users of the league’s Xbox One app, which provides an interactive way of browsing video clips, fantasy-football statistics, and other metrics, will be able to explore a feature called Next Gen Replay, which allows them to track each player’s speed and trajectory, combining moving lines on a virtual field with live footage from the real one. But, for now, coaches are shut out; once a player exits the locker room on game day, the dynamic point cloud that is generated by his movement through space is a corporately owned data set, as outlined in the league’s 2011 collective-bargaining agreement. Which should tell you all you need to know about the NFL’s role in promoting sporting excellence.
4. Giraffe: Using Deep Reinforcement Learning to Play Chess (Matthew Lai) — Giraffe, a chess engine that uses self-play to discover all its domain-specific knowledge, with minimal hand-crafted knowledge given by the programmer. See also the code. (via GitXiv)

## Four short links: 14 September 2015

### Robotics Boom, Apple in Communities, Picture Research, and Programming Enlightenment

1. Uber Would Like to Buy Your Robotics Department (NY Times) — ‘‘If you’re well versed in the area of robotics right now and you’re not working on self-driving cars, you’re either an idiot or you have more of a passion for something else,’’ says Jerry Pratt, head of a robotics team in Pensacola that worked on a humanoid robot that beat Carnegie Mellon’s CHIMP in this year’s contest. ‘‘It’s a multibillion- if not trillion-dollar industry.’’
2. What the Heck is Angela Ahrendts Doing at Apple? (Fortune) — Apple has always intended for each of them to be a community center; now Cook and Ahrendts want them to be the community center. That means expanding from serving existing and potential customers to, say, creating opportunities for underserved minorities and women. “In my mind,” Ahrendts says, store leaders “are the mayors of their community.”
3. Imitation vs. Innovation: Product Similarity Network in the Motion Picture Industry (PDF) — machine learning to build a model of movies released in the last few decades, We find that big-budget movies benefit more from imitation, but small-budget movies favor novelty. This leads to interesting market dynamics that cannot be produced by a model without learning.
4. Enlightened Imagination for Citizens (Bret Victor) — It should be painfully obvious that learning how to program a computer has no direct connection to any high form of enlightenment. Amen!

## Four short links: 1 September 2015

### People Detection, Ratings Patterns, Inspection Bias, and Cloud Filesystem

1. End-to-End People Detection in Crowded Scenes — research paper and code. When parsing the title, bind “end-to-end” to “scenes” not “people”.
2. Statistical Patterns in Movie Ratings (PLOSone) — We find that the distribution of votes presents scale-free behavior over several orders of magnitude, with an exponent very close to 3/2, with exponential cutoff. It is remarkable that this pattern emerges independently of movie attributes such as average rating, age and genre, with the exception of a few genres and of high-budget films.
3. The Inspection Bias is EverywhereIn 1991, Scott Feld presented the “friendship paradox”: the observation that most people have fewer friends than their friends have. He studied real-life friends, but the same effect appears in online networks: if you choose a random Facebook user, and then choose one of their friends at random, the chance is about 80% that the friend has more friends. The friendship paradox is a form of the inspection paradox. When you choose a random user, every user is equally likely. But when you choose one of their friends, you are more likely to choose someone with a lot of friends. Specifically, someone with x friends is overrepresented by a factor of x.
4. s3qla file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. (GPLv3)