"predictive analytics" entries
The O'Reilly Radar Podcast: Cait O'Riordan on Shazam's predictive analytics, and Francine Bennett on using data for evil.
Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.
In this week’s Radar Podcast, I chat with Cait O’Riordan, VP of product, music and platforms at Shazam. She talks about the current state of predictive analytics and how Shazam is able to predict the success of a song, often in the first few hours after its release. We also talk about the Internet of Things and how products like the Apple Watch affect Shazam’s product life cycles as well as the behaviors of their users.
Predicting the next pop hit
Shazam has more than 100 million monthly active users, and its users Shazam more than 20 million times per day. This, of course, generates a ton of data that Shazam uses in myriad ways, not the least of which is to predict the success of a song. O’Riordan explained how they approach their user data and how they’re able to accurately predict pop hits (and misses):
What’s interesting from a data perspective is when someone takes their phone out of their pocket, unlocks it, finds the Shazam app, and hits the big blue button, they’re not just saying, “I want to know the name of this song.” They’re saying, “I like this song sufficiently to do that.” There’s an amount of effort there that implies some level of liking. That’s really interesting, because you combine that really interesting intention on the part of the user plus the massive data set, you can cut that in lots and lots of different ways. We use it for lots of different things.
At the most basic level, we’re looking at what songs are going to be popular. We can predict, with a relative amount of accuracy, what will hit the Top 100 Billboard Chart 33 days out, roughly. We can look at that in lots of different territories as well. We can also look and see, in the first few hours of a track, whether a big track is going to go on to be successful. We can look at which particular part of the track is encouraging people to Shazam and what makes a popular hit. We know that, for example, for a big pop hit, you’ve got about 10 seconds to convince somebody to find the Shazam app and press that button. There are lots of different ways that we can look at that data, going right into the details of a particular song, zooming out worldwide, or looking in different territories just due to that big worldwide and very engaged audience.
Identifying the right evaluation methods is essential to successful machine learning.
We all know that a working predictive model is a powerful business weapon. By translating data into insights and subsequent actions, businesses can offer better customer experience, retain more customers, and increase revenue. This is why companies are now allocating more resources to develop, or purchase, machine learning solutions.
While expectations on predictive analytics are sky high, the implementation of machine learning in businesses is not necessarily a smooth path. Interestingly, the problem often is not the quality of data or algorithms. I have worked with a number of companies that collected a lot of data; ensured the quality of the data; used research-proven algorithms implemented by well-educated data scientists; and yet, they failed to see beneficial outcomes. What went wrong? Doesn’t good data plus a good algorithm equal beneficial insights? Read more…
In-depth Strata community profile on Kira Radinsky
Kira Radinsky started coding at the age of four, when her mother and aunt encouraged her to excel at one of her favorite computer games by writing a few simple lines of code. Since then, she’s been a firecracker in the field of predictive analytics, building algorithms to improve business interactions, and create a data-driven economy, and in the past, building systems to detect outbreaks of disease and social unrest around the world. She also gave a predictive analytics talk at the last Strata.
I had a conversation with Kira last month about her entry into the field and her most exciting moments thus far.
When did you first become interested in science?
Kira Radinsky: When I was four or five, my mom bought me a computer game. In order to go to the next level, you had solve simple math problems, which became increasingly harder with time. At one point I couldn’t solve one of the problems. Then I asked my aunt for help because she was a software engineer. She showed me how to write some very simple code in order to proceed to the next level in the game. This was my first time to actually code something.
In the army, I was a software engineer. I built big systems. I felt that I was contributing to my country and it was amazing for me. When I finished my service, I was accepted to the excellence program at the Technion [Israel Institute of Technology] because I had already started studying there when I was 15. I just continued on to a graduate degree.
I knew I wanted to do something in the field of artificial intelligence, because I really wanted to pursue the idea of using computers to make a global impact. I was really into that. I realized that the vast data amounts that we produce could be used to solve important problems.
In 2011, thousands of birds fell out of the sky on New Years Eve. People were writing “we don’t know what’s going on”. It was a conundrum. A few days later, a hundred thousand fish washed up dead on the shore. Many people were saying that it was the end of the world because it was the end of the Mayan calendar!
Hypothesis-free data analysis turns up unexpected incidences of illness
This posting was written by guest author Arijit Sengupta, CEO of BeyondCore. Arijit will speak at Strata Rx 2013 on the kinds of data analysis discussed in this post.
Much of the effort in health reform in the United States has gone toward recruiting 18-to-35 year olds into the insurance pool so that the US economy and insurers can afford the Affordable Care Act (ACA). The assumption here is that health care costs will be less for this young population than for other people, but is this true? Our recent analysis of 6.8 million insured young adults, across 200,000 variable combinations, suggests that young adults may be more expensive to insure than we realize.
Our study shows a high occurrence of mental health diseases among 18-to-35 year olds who have insurance and therefore more affordable access to medical care. Moreover, expenses associated with mental health conditions are very high, especially when coupled with a physical ailment. As the previously uninsured 18-to-35 year olds get access to affordable care, we may see a similarly high rate of mental health diagnoses among this population. The bad news is that the true costs of insuring 18-to-35 year olds might be much higher than previously suspected. The good news is that previously undiagnosed and untreated mental health conditions may now actually get diagnosed and treated, creating a significant societal benefit.
Health data can go beyond the averages and first order patient characteristics to find long-term trends
This article was written with Arijit Sengupta, CEO of BeyondCore. Tim and Arijit will speak at Strata Rx 2013 on the topic of this post.
Current healthcare cost prevention efforts focus on the top 1% of highest risk patients. As care coordination efforts expand to a larger set of the patient population, the critical question is: If you’re a care manager, which patients should you offer additional care to at any given point in time? Our research shows that focusing on patients with the highest risk scores or highest current costs create suboptimal roadmaps. In this article we share an approach to predict patients whose costs are about to skyrocket, using a hypothesis-free micro-segmentation analysis. From there, working with physicians and care managers, we can formulate appropriate interventions.
Arijit Sengupta of BeyondCore uncovers hidden relationships in public health data
The importance of visualizing data is universally recognized. But, usually the data is passive input to some visualization tool and the users have to specify the precise graph they want to visualize. BeyondCore simplifies this process by automatically evaluating millions of variable combinations to determine which graphs are the most interesting, and then highlights these to users. In essence, BeyondCore automatically tells us the right questions to ask of our data.
In this video, Arijit Sengupta, CEO of BeyondCore, describes how public health data can be analyzed in real-time to discover anomalies and other intriguing relationships, making them readily accessible even to viewers without a statistical background. Arijit will be speaking at Strata Rx 2013 with Tim Darling of Objective Health, a McKinsey Solution for Healthcare Providers, on the topic of this post.
Using data science to predict the Oscars
Sophisticated algorithms are not going to write the perfect script or crawl YouTube to find the next Justin Beiber (that last one I think we can all be thankful for!). But a model can predict the probability of a nominee winning the Oscar, and recently our model has Argo overtaking Lincoln as the likely winner of Best Picture. Every day on FarsiteForecast.com we’ve been describing applications of data science for the media and entertainment industry, illustrating how our models work, and updating the likely winners based on the outcomes of the Awards Season leading up to the Oscars. Just as predictive analytics provides valuable decision-making tools in sectors from retail to healthcare to advocacy, data science can also empower smarter decisions for entertainment executives, which led us to launch the Oscar forecasting project. While the potential for data science to impact any organization is as unique as each company itself, we thought we’d offer a few use cases that have wide application for media and entertainment organizations.
William Gibson's apt predictions, why C matters, and a vote against lightweight DRM.
This week on O'Reilly: James Turner noted that the corporate dystopia predicted in "Neuromancer" has come to pass, author David Griffith discussed C's continued popularity, and Joe Wikert explained why lightweight ebook DRM isn't viable.