A new report explores how to evaluate your machine learning models.
Get notified when our free report “Evaluating Machine Learning Models: A beginner’s guide to key concepts and pitfalls” is available for download. Editor’s note: This is an excerpt of “Evaluating Machine Learning Models,” by Alice Zheng.
Alice Zheng will be part of the Data Science Summit and Dato Conference in July — a non-profit event jointly organized by Intel, Comcast, Pandora, Dato, Cloudera, and O’Reilly Media — in San Francisco. Visit the conference website for more information on the program. Use the discount code OREILLY20 and get 20% off either one or both days of the conference.
This report on evaluating machine learning models arose out of a sense of need. The content was first published as a series of six technical posts on the Dato Machine Learning Blog. I was the editor of the blog, and I needed something to publish for the next day. Dato builds machine learning tools that help users build intelligent data products. In our conversations with the community, we sometimes ran into a confusion in terminology. For example, people would ask for cross validation as a feature, when what they really meant was hyperparameter tuning, a feature we already had. So, I thought, “Aha! I’ll just quickly explain what these concepts mean and point folks to the relevant sections in the user guide.”
I sat down to write a blog post to explain cross validation, hold-out data sets, and hyperparameter tuning. After the first two paragraphs, however, I realized that it would take a lot more than a single blog post. The three terms sit at different depths in the concept hierarchy of machine learning model evaluation. Cross validation and hold-out validation are ways of chopping up a data set in order to measure the model’s performance on “unseen” data. Hyperparameter tuning, on the other hand, is a more “meta” process of model selection. But why does the model need “unseen” data, and what’s meta about hyperparameters? In order to explain all of that, I needed to start from the basics. First, I needed to explain the high-level concepts and how they fit together. Only then could I dive into each one in detail. Read more…
Areas concerned with shapes, invariants, and dynamics, in high-dimensions, are proving useful in data analysis
I’ve been noticing unlikely areas of mathematics pop-up in data analysis. While signal processing is a natural fit, topology, differential and algebraic geometry aren’t exactly areas you associate with data science. But upon further reflection perhaps it shouldn’t be so surprising that areas that deal in shapes, invariants, and dynamics, in high-dimensions, would have something to contribute to the analysis of large data sets. Without further ado, here are a few examples that stood out for me. (If you know of other examples of recent applications of math in data analysis, please share them in the comments.)
Compressed sensing is a signal processing technique which makes efficient data collection possible. As an example using compressed sensing images can be reconstructed from small amounts of data. Idealized Sampling is used to collect information to measure the most important components. By vastly decreasing the number of measurements to be collected, less data needs to stored, and one reduces the amount of time and energy1 needed to collect signals. Already there have been applications in medical imaging and mobile phones.
The problem is you don’t know ahead of time which signals/components are important. A series of numerical experiments led Emanuel Candes to believe that random samples may be the answer. The theoretical foundation as to why a random set of signals would work, where laid down in a series of papers by Candes and Fields Medalist Terence Tao2.
Nikki Graziano’s intriguing integration of mathematical curves into her photography sparked a Radar discussion about the relationship between mathematics and the real world. Does her work give insight into the nature of mathematics? Or into the nature of the world? And if so, what kind of insight? Mathematically, matching one curve to another isn’t a big deal. Finding an equation that matches the curve of an artfully trimmed hedge is easy. The question is whether that curve tells us anything, or whether it’s just another stupid math trick.
One of the largest gatherings of mathematicians, the joint meetings of the AMS/MAA/SIAM, took place last week in San Francisco. Knowing that there were going to be over 6,000 pure and applied mathematicians at Moscone West, I took some time off from work and attended several sessions. Below are a few (somewhat technical) highlights. It’s the only conference I’ve attended where the person managing the press room, was also working on some equations in-between helping the media.
This morning, Tim Bray tweeted about a post on prime numbers and Benford’s law. To cut the esoterica short, one of the big problems in prime numbers is that people don’t know how they’re distributed. This post suggests that Benford’s Law describes the distribution of the first digit of prime numbers. One of the comments asked an important question: is this really just an artifact of base 10? Math really doesn’t “know anything” about bases, so if this idea doesn’t generalize to bases other than 10, it doesn’t mean much.