"predictive analytics" entries

How Shazam predicts pop hits

The O'Reilly Radar Podcast: Cait O'Riordan on Shazam's predictive analytics, and Francine Bennett on using data for evil.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

record_player_from_1920s_Marcin_Wichary_FlickrIn this week’s Radar Podcast, I chat with Cait O’Riordan, VP of product, music and platforms at Shazam. She talks about the current state of predictive analytics and how Shazam is able to predict the success of a song, often in the first few hours after its release. We also talk about the Internet of Things and how products like the Apple Watch affect Shazam’s product life cycles as well as the behaviors of their users.

Predicting the next pop hit

Shazam has more than 100 million monthly active users, and its users Shazam more than 20 million times per day. This, of course, generates a ton of data that Shazam uses in myriad ways, not the least of which is to predict the success of a song. O’Riordan explained how they approach their user data and how they’re able to accurately predict pop hits (and misses):

What’s interesting from a data perspective is when someone takes their phone out of their pocket, unlocks it, finds the Shazam app, and hits the big blue button, they’re not just saying, “I want to know the name of this song.” They’re saying, “I like this song sufficiently to do that.” There’s an amount of effort there that implies some level of liking. That’s really interesting, because you combine that really interesting intention on the part of the user plus the massive data set, you can cut that in lots and lots of different ways. We use it for lots of different things.

At the most basic level, we’re looking at what songs are going to be popular. We can predict, with a relative amount of accuracy, what will hit the Top 100 Billboard Chart 33 days out, roughly. We can look at that in lots of different territories as well. We can also look and see, in the first few hours of a track, whether a big track is going to go on to be successful. We can look at which particular part of the track is encouraging people to Shazam and what makes a popular hit. We know that, for example, for a big pop hit, you’ve got about 10 seconds to convince somebody to find the Shazam app and press that button. There are lots of different ways that we can look at that data, going right into the details of a particular song, zooming out worldwide, or looking in different territories just due to that big worldwide and very engaged audience.

Read more…


Forecasting events, from disease outbreaks to sales to cancer research

The O'Reilly Data Show Podcast: Kira Radinsky on predicting events using machine learning, NLP, and semantic analysis.

Editor’s note: One of the more popular speakers at Strata + Hadoop World, Kira Radinsky was recently profiled in the new O’Reilly Radar report, Women in Data: Cutting-Edge Practitioners and Their Views on Critical Skills, Background, and Education.

When I first took over organizing Hardcore Data Science at Strata + Hadoop World, one of the first speakers I invited was Kira Radinsky. Radinsky had already garnered international recognition for her work forecasting real-world events (disease outbreak, riots, etc.). She’s currently the CTO and co-founder of SalesPredict, a start-up using predictive analytics to “understand who’s ready to buy, who may buy more, and who is likely to churn.”

I recently had a conversation with Radinsky, and she took me through the many techniques and subject domains from her past and present research projects. In grad school, she helped build a predictive system that combined newspaper articles, Wikipedia, and other open data sets. Through fine-tuned semantic analysis and NLP, Radinsky and her collaborators devised new metrics of similarity between events. The techniques she developed for that predictive software system are now the foundation of applications across many areas. Read more…


Designing on a system level

Andy Goodman on service design, embeddables, and predictive analytics.


I recently sat down with Andy Goodman, designer and group director of Fjord’s US studios. Goodman has been designing and managing design teams around the globe for the past 20 years. Goodman is a contributor to Designing for Emerging Technologies — our conversation covers embeddables, wearables, and predictive analytics. To kick off the conversation, I asked Goodman to define “service design”:

“It’s well-known that if you ask a service designer to define “service design,” you get 10 different answers. For me, it’s really about thinking on a system level about design … It’s thinking about how systems, and not just computer systems, but how human systems and computer systems and physical systems all interact with each other. You need to be thinking not about individual moments; you need to be thinking about journeys and flows, and thinking about how a human being will naturally, without even thinking about it, move from one context to another using different devices, using physical objects, being in physical spaces. For me, it was very appealing, this idea that you can design more than just interactions in a way, more than just interactions on a screen. You can actually design other things that are more about the way we live and work and play.”

Read more…


A good nudge trumps a good prediction

Identifying the right evaluation methods is essential to successful machine learning.

Editor’s note: this is part of our investigation into analytic models and best practices for their selection, deployment, and evaluation.

We all know that a working predictive model is a powerful business weapon. By translating data into insights and subsequent actions, businesses can offer better customer experience, retain more customers, and increase revenue. This is why companies are now allocating more resources to develop, or purchase, machine learning solutions.

While expectations on predictive analytics are sky high, the implementation of machine learning in businesses is not necessarily a smooth path. Interestingly, the problem often is not the quality of data or algorithms. I have worked with a number of companies that collected a lot of data; ensured the quality of the data; used research-proven algorithms implemented by well­-educated data scientists; and yet, they failed to see beneficial outcomes. What went wrong? Doesn’t good data plus a good algorithm equal beneficial insights? Read more…


Solving mysteries using predictive analytics

In-depth Strata community profile on Kira Radinsky

Kira Radinsky started coding at the age of four, when her mother and aunt encouraged her to excel at one of her favorite computer games by writing a few simple lines of code. Since then, she’s been a firecracker in the field of predictive analytics, building algorithms to improve business interactions, and create a data-driven economy, and in the past, building systems to detect outbreaks of disease and social unrest around the world. She also gave a predictive analytics talk at the last Strata.

I had a conversation with Kira last month about her entry into the field and her most exciting moments thus far.

When did you first become interested in science?


Photo provided courtesy of Kira Radinsky

Kira Radinsky: When I was four or five, my mom bought me a computer game. In order to go to the next level, you had solve simple math problems, which became increasingly harder with time. At one point I couldn’t solve one of the problems. Then I asked my aunt for help because she was a software engineer. She showed me how to write some very simple code in order to proceed to the next level in the game. This was my first time to actually code something.

In the army, I was a software engineer. I built big systems. I felt that I was contributing to my country and it was amazing for me. When I finished my service, I was accepted to the excellence program at the Technion [Israel Institute of Technology] because I had already started studying there when I was 15. I just continued on to a graduate degree.

I knew I wanted to do something in the field of artificial intelligence, because I really wanted to pursue the idea of using computers to make a global impact. I was really into that. I realized that the vast data amounts that we produce could be used to solve important problems.

In 2011, thousands of birds fell out of the sky on New Years Eve. People were writing “we don’t know what’s going on”. It was a conundrum. A few days later, a hundred thousand fish washed up dead on the shore. Many people were saying that it was the end of the world because it was the end of the Mayan calendar!

Read more…


Mental Health Costs for 18-to-35 Year Olds: The Hidden Epidemic

Hypothesis-free data analysis turns up unexpected incidences of illness

This posting was written by guest author Arijit Sengupta, CEO of BeyondCore. Arijit will speak at Strata Rx 2013 on the kinds of data analysis discussed in this post.

Much of the effort in health reform in the United States has gone toward recruiting 18-to-35 year olds into the insurance pool so that the US economy and insurers can afford the Affordable Care Act (ACA). The assumption here is that health care costs will be less for this young population than for other people, but is this true? Our recent analysis of 6.8 million insured young adults, across 200,000 variable combinations, suggests that young adults may be more expensive to insure than we realize.

Our study shows a high occurrence of mental health diseases among 18-to-35 year olds who have insurance and therefore more affordable access to medical care. Moreover, expenses associated with mental health conditions are very high, especially when coupled with a physical ailment. As the previously uninsured 18-to-35 year olds get access to affordable care, we may see a similarly high rate of mental health diagnoses among this population. The bad news is that the true costs of insuring 18-to-35 year olds might be much higher than previously suspected. The good news is that previously undiagnosed and untreated mental health conditions may now actually get diagnosed and treated, creating a significant societal benefit.

Read more…

Comments: 3

The Next “Top 5%”: Identifying patients for additional care through micro-segmentation

Health data can go beyond the averages and first order patient characteristics to find long-term trends

This article was written with Arijit Sengupta, CEO of BeyondCore. Tim and Arijit will speak at Strata Rx 2013 on the topic of this post.

Current healthcare cost prevention efforts focus on the top 1% of highest risk patients. As care coordination efforts expand to a larger set of the patient population, the critical question is: If you’re a care manager, which patients should you offer additional care to at any given point in time? Our research shows that focusing on patients with the highest risk scores or highest current costs create suboptimal roadmaps. In this article we share an approach to predict patients whose costs are about to skyrocket, using a hypothesis-free micro-segmentation analysis. From there, working with physicians and care managers, we can formulate appropriate interventions.

Read more…


One-click analysis: Detecting and visualizing insights automatically

Arijit Sengupta of BeyondCore uncovers hidden relationships in public health data

The importance of visualizing data is universally recognized. But, usually the data is passive input to some visualization tool and the users have to specify the precise graph they want to visualize. BeyondCore simplifies this process by automatically evaluating millions of variable combinations to determine which graphs are the most interesting, and then highlights these to users. In essence, BeyondCore automatically tells us the right questions to ask of our data.

In this video, Arijit Sengupta, CEO of BeyondCore, describes how public health data can be analyzed in real-time to discover anomalies and other intriguing relationships, making them readily accessible even to viewers without a statistical background. Arijit will be speaking at Strata Rx 2013 with Tim Darling of Objective Health, a McKinsey Solution for Healthcare Providers, on the topic of this post.

Read more…


Big data comes to the big screen

Using data science to predict the Oscars

By Michael GoldFarsite

Sophisticated algorithms are not going to write the perfect script or crawl YouTube to find the next Justin Beiber (that last one I think we can all be thankful for!). But a model can predict the probability of a nominee winning the Oscar, and recently our model has Argo overtaking Lincoln as the likely winner of Best Picture. Every day on FarsiteForecast.com we’ve been describing applications of data science for the media and entertainment industry, illustrating how our models work, and updating the likely winners based on the outcomes of the Awards Season leading up to the Oscars. 

Just as predictive analytics provides valuable decision-making tools in sectors from retail to healthcare to advocacy, data science can also empower smarter decisions for entertainment executives, which led us to launch the Oscar forecasting project. While the potential for data science to impact any organization is as unique as each company itself, we thought we’d offer a few use cases that have wide application for media and entertainment organizations.

Read more…

Comment: 1

Top Stories: June 25-29, 2012

William Gibson's apt predictions, why C matters, and a vote against lightweight DRM.

This week on O'Reilly: James Turner noted that the corporate dystopia predicted in "Neuromancer" has come to pass, author David Griffith discussed C's continued popularity, and Joe Wikert explained why lightweight ebook DRM isn't viable.