"applied machine learning" entries

Unpacking technical jargon in machine learning

A new report explores how to evaluate your machine learning models.

Mathematics_concept_collage_Fir0002_Wikimedia_Commons

Get notified when our free report “Evaluating Machine Learning Models: A beginner’s guide to key concepts and pitfalls” is available for download. Editor’s note: This is an excerpt of “Evaluating Machine Learning Models,” by Alice Zheng.


Alice Zheng will be part of the Data Science Summit and Dato Conference in July — a non-profit event jointly organized by Intel, Comcast, Pandora, Dato, Cloudera, and O’Reilly Media — in San Francisco. Visit the conference website for more information on the program. Use the discount code OREILLY20 and get 20% off either one or both days of the conference.

This report on evaluating machine learning models arose out of a sense of need. The content was first published as a series of six technical posts on the Dato Machine Learning Blog. I was the editor of the blog, and I needed something to publish for the next day. Dato builds machine learning tools that help users build intelligent data products. In our conversations with the community, we sometimes ran into a confusion in terminology. For example, people would ask for cross validation as a feature, when what they really meant was hyperparameter tuning, a feature we already had. So, I thought, “Aha! I’ll just quickly explain what these concepts mean and point folks to the relevant sections in the user guide.”

I sat down to write a blog post to explain cross validation, hold-out data sets, and hyperparameter tuning. After the first two paragraphs, however, I realized that it would take a lot more than a single blog post. The three terms sit at different depths in the concept hierarchy of machine learning model evaluation. Cross validation and hold-out validation are ways of chopping up a data set in order to measure the model’s performance on “unseen” data. Hyperparameter tuning, on the other hand, is a more “meta” process of model selection. But why does the model need “unseen” data, and what’s meta about hyperparameters? In order to explain all of that, I needed to start from the basics. First, I needed to explain the high-level concepts and how they fit together. Only then could I dive into each one in detail. Read more…

A good nudge trumps a good prediction

Identifying the right evaluation methods is essential to successful machine learning.

Editor’s note: this is part of our investigation into analytic models and best practices for their selection, deployment, and evaluation.

We all know that a working predictive model is a powerful business weapon. By translating data into insights and subsequent actions, businesses can offer better customer experience, retain more customers, and increase revenue. This is why companies are now allocating more resources to develop, or purchase, machine learning solutions.

While expectations on predictive analytics are sky high, the implementation of machine learning in businesses is not necessarily a smooth path. Interestingly, the problem often is not the quality of data or algorithms. I have worked with a number of companies that collected a lot of data; ensured the quality of the data; used research-proven algorithms implemented by well­-educated data scientists; and yet, they failed to see beneficial outcomes. What went wrong? Doesn’t good data plus a good algorithm equal beneficial insights? Read more…