Areas concerned with shapes, invariants, and dynamics, in high-dimensions, are proving useful in data analysis
I’ve been noticing unlikely areas of mathematics pop-up in data analysis. While signal processing is a natural fit, topology, differential and algebraic geometry aren’t exactly areas you associate with data science. But upon further reflection perhaps it shouldn’t be so surprising that areas that deal in shapes, invariants, and dynamics, in high-dimensions, would have something to contribute to the analysis of large data sets. Without further ado, here are a few examples that stood out for me. (If you know of other examples of recent applications of math in data analysis, please share them in the comments.)
Compressed sensing is a signal processing technique which makes efficient data collection possible. As an example using compressed sensing images can be reconstructed from small amounts of data. Idealized Sampling is used to collect information to measure the most important components. By vastly decreasing the number of measurements to be collected, less data needs to stored, and one reduces the amount of time and energy1 needed to collect signals. Already there have been applications in medical imaging and mobile phones.
The problem is you don’t know ahead of time which signals/components are important. A series of numerical experiments led Emanuel Candes to believe that random samples may be the answer. The theoretical foundation as to why a random set of signals would work, where laid down in a series of papers by Candes and Fields Medalist Terence Tao2.
Nikki Graziano’s intriguing integration of mathematical curves into her photography sparked a Radar discussion about the relationship between mathematics and the real world. Does her work give insight into the nature of mathematics? Or into the nature of the world? And if so, what kind of insight? Mathematically, matching one curve to another isn’t a big deal. Finding an equation that matches the curve of an artfully trimmed hedge is easy. The question is whether that curve tells us anything, or whether it’s just another stupid math trick.
One of the largest gatherings of mathematicians, the joint meetings of the AMS/MAA/SIAM, took place last week in San Francisco. Knowing that there were going to be over 6,000 pure and applied mathematicians at Moscone West, I took some time off from work and attended several sessions. Below are a few (somewhat technical) highlights. It’s the only conference I’ve attended where the person managing the press room, was also working on some equations in-between helping the media.
This morning, Tim Bray tweeted about a post on prime numbers and Benford’s law. To cut the esoterica short, one of the big problems in prime numbers is that people don’t know how they’re distributed. This post suggests that Benford’s Law describes the distribution of the first digit of prime numbers. One of the comments asked an important question: is this really just an artifact of base 10? Math really doesn’t “know anything” about bases, so if this idea doesn’t generalize to bases other than 10, it doesn’t mean much.