Many data scientists are comfortable working with structured operational data and unstructured text. Newer techniques like deep learning have opened up data types like images, video, and audio.
Other common data sources are garnering attention. With the rise of mobile phones equipped with GPS, I’m meeting many more data scientists at start-ups and large companies who specialize in spatio-temporal pattern recognition. Analyzing “moving dots” requires specialized tools and techniques.
Subscribe to the O’Reilly Data Show Podcast
A few months ago, I sat down with Rajiv Maheswaran founder and CEO of Second Spectrum, a company that applies analytics to sports tracking data. Maheswaran talked about this new kind of data and the challenge of finding patterns:
“It’s interesting because it’s a new type of data problem. Everybody knows that big data machine learning has done a lot of stuff in structured data, in photos, in translation for language, but moving dots is a very new kind of data where you haven’t figured out the right feature set to be able to find patterns from. There’s no language of moving dots, at least not that computers understand. People understand it very well, but there’s no computational language of moving dots that are interacting. We wanted to build that up, mostly because data about moving dots is very, very new. It’s only in the last five years, between phones and GPS and new tracking technologies, that moving data has actually emerged.”
Orders of magnitude higher than Moneyball box score data
In professional sports teams, Second Spectrum found an audience struggling to make sense of large amounts of “tracking data.” Maheswaran explained:
“It turns out that sports is one of the areas where they have really, really great data. For example: in GPS, there’s noise issues and in phones, there are sampling issues; in sports in the last year, there have been tracking technologies placed in all major sports where they’re tracking all the players and the ball at a very, very high frame rate. You get all this really fantastic movement data, and then you have all these people like coaches and front offices and media who want to find patterns in this data. There’s great data, there are people who really care about the problem, and it’s a great place to build this science.
“…It’s kind of like Moneyball, except three orders of magnitude higher because Moneyball was using box score data, which is hundreds of events a game, and now with tracking data, you have millions of events per game. Now there are positions of every sort of player and the ball, so the question is: if I have a big pile of numbers, somebody just dumped a million numbers about the game, how do I actually extract value from that? How do you extract patterns, meanings, stories, and semantics that people care about: that’s the challenge.”
Real-time data will change the game
In the near future, coaches will have access to this data and to analytic tools during live game situations. This would represent an interesting example of one of the trends were tracking — cognitive augmentation: the combination of big data, algorithms, and efficient user interfaces. Maheswaran said it’s just a matter of time and increasing interest:
“I think that’s going to happen very soon. It’s not the case today, but I would at least hope that within three years, once we have all this data in real time, coaches will want to do it. It will change the game. It’s going to go out to the fans anyway, so if the coaches don’t get it, the fans are going to see all kinds of information in real time that the coaches won’t have, so eventually the coaches will want that. I think that everybody in the sports leagues want to improve the game, and I think just having this data in real time is going to facilitate that.”
Listen to the full podcast in the player above or through SoundCloud.