Dating with data

OkCupid CEO Sam Yagan on how data shapes the dating business.

OkCupid logoOkCupid is a free dating site with seven million users. The site’s blog, OkTrends, mines data from those users to tackle important subjects like “The case for an older woman” and “The REAL ‘stuff white people like’.”

Beyond clever headlines, OkCupid also uses an unusual pedigree to separate itself from the dating site pack: The business was founded by four Harvard-educated mathematicians.

“It probably scared people when they first heard that four math majors were starting a dating site,” said CEO Sam Yagan during a recent interview. But the founders’ backgrounds greatly influenced how they approached the problem of dating.

“A lot of other dating sites are based on psychology,” Yagan said. “The fundamental premise of a site like eHarmony is that they know the answer. Our approach to dating isn’t that there’s some psychological theory that will be the answer to all your problems. We think that dating is a problem to be solved using data and analytics. There is no magic formula that can help everyone to find love. Instead, we bring value by building a decent-sized platform that allows people to provide information that helps us to customize a match algorithm to each person’s needs.”

OkCupid works by having users state basic preferences and answering questions like “Is it wrong to spank a child who’s been bad?” Users are matched based on the overlap of their answers and how important each question is to both users.

Yagan said data was built into the business model from the beginning. “We knew from the time we started the company that the data we were generating would have three purposes: helping us match people up, attracting advertisers since that was the core of our revenue model, and that the data would also be interesting socially.”

In 2007, the company hired a PR firm to publicize some of its findings, such as the fact that when gas prices rise, users narrow the search radius for matches. “We called dozens of reporters and nobody cared,” Yagan said. So OkCupid fired the PR firm and started publishing their findings on the OkTrends blog. The blog has thus far doubled traffic to the site.

“The blog is partly an advice column, but instead of being written by a psychologist, the data writes itself,” Yagan said. “For example, we don’t tell you that you should or should not use a flash for your profile photo. We just tell you that if you use a flash you’ll look seven years older.”

Web 2.0 Summit, being held October 17-19 in San Francisco, will examine “The Data Frame” — focusing on the impact of data in today’s networked economy.

Save $300 on registration with the code RADAR

I asked Yagan about the data on which OkTrends draws. “We have people’s registration data,” he said. “Then we have stated preferences; the answers that people give to the questions we ask them. We use that kind of data occasionally, but it’s not the core difference that we have. The core difference is in the category of revealed preferences. Imagine if you had a video camera in every bar and you could observe every interaction between two people and see the success rate of that interaction. We essentially have that video camera on our site.”

The reason revealed preferences are so important is that they track real-world behavior — what people really want rather than what they say they want. “When you get 12 messages and you only reply to three of them, you are voting with your time,” Yagan said. “Or when a guy is shorter than you, you don’t reply.”

Mobile adds a new revealed preferences dimension for OkCupid. “As our product gets more mobile and location-aware, we are more likely to be on that date with them,” Yagan said. “Then we can model the kinds of conversations on the site that lead to an in-person meeting.” OkCupid can currently track the five million messages sent every week on the site as well as other revealed preferences, like ratings of profiles.

According to Yagan, OkCupid doesn’t use sophisticated data mining or analytics tools: “Most of it can be done by querying the database and crunching numbers in Excel. The fact that we have four math majors and a full-time statistician means that we take that number crunching very seriously.”


tags: ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • Richard

    I am an psychologist and just wanted to point out that psychology is based on theories which were falsified by studies and their underlying statistical calculations.
    It sounded like Mr Yagan said that psychology is something people make up and don’t prove because the OkCupid approach is – in opposite – data and analytics driven. Well, it is quite a psychological approach because *it is* data and analytics driven like psychology.
    These other cuopling sites seem to appear like they use a psychological approach in matching the right people but without established psychological theories based on hard data, they are just making things up.

  • Steve Newcomb

    Maybe it works (to whatever extent it works) because of the placebo effect. Like placebos, these dating site sugar pills come in different colors. Some even cloak themselves in Jesus. A Jesus-y pill is more likely to fix up Christians. Those who trust computers will be happy to have computers make decisions for them.

  • Being I used to write software I occasionally will sign up under a pseudo name and check out the SQL queries and matches running, since I have written enough of those in my time and also watch the aggregation processes and Ok Cupid is very logical and algorithmically mathematic.

    I had to laugh my fanny off after a couple of weeks when I was sent a “flow sheet”. It is what it is and being there’s surprise element when I check out web and other software it so funny in the fact that real life if one wants to meet a partner the last thing they think about is a flow sheet:) It’s just another way of aggregating web data and running queries:)

    But what I have found though is the conversations are from people who live and die on these text boxes on the web and it is work to keep up with all of this to the point to where one just wants to end the participation and meet folks the old fashion way as when relationships get that complicated along with everything else is with data and algorithms today it’s an area that not only loses people but also hurts some human ethics innocently along the line.

    It is a good way to meet someone who would normally not come across your path and as soon as there’s any interest, shut off the data and go talk to and meet the person before one becomes so over whelmed with the amount of data there to where you walk with nothing:) These sites should be very short term and not take up a lot of personal time to be effective and not end up being a distraction or disruption.

  • Hmmm …maybe statistics can also answer a few other questions. Like when a women cuts her hair short she uses primary photos of when she still had long hair. Or how about a date indicator on the photo so it can flag photos 5 years old :)

  • I found the data fascinating! I think it’s wonderful that this information can be gathered and presented in graphic form. Thanks for sharing.

    I, too, am a psychologist. I love data and appreciate that I am able to make my living working with the science of observing human interaction to see what works and what doesn’t. Helpful psychology is research and results driven, not conjecture.

    In an effort to help others work with their own data, so to speak, I developed a tool for individuals and couples to gather information about their personal preferences in and around relationships. I found that asking and answering deep questions of oneself, while in a dating relationship, allows a person to see, literally, how their relationship stacks up. TheQuestions is an iPhone app that invites one to look closely at personal preferences and weigh each with regard to the person they are dating. (see

    It’s kind of a scientific approach to vetting a relationship, using some psychology behind the phrasing of each question. But mostly TheQuestions relies on the results, the data, to tell the important story. The answers bring insight to a relationship, or to the preferences involved in making and maintaining good matches. It also lets a person know, in a very visible way, what they are buying into when they choose to deepen their investment. After answering the 76 questions there is a bar graph of results, that in some ways attests to the viability of the relationship in question.

    This approach helps balance the emotional aspects of infatuation and love with the cold hard facts of reality and reason – on many levels, the numbers don’t lie! It’s this balance of emotion and reason that leads to long-lasting partnership. (Mine turns 19 this summer!).

  • Great article. I must say I am shocked to know that four math majors actually start a dating site cause they are more of the mental sort, they are very unlikely kind of people who would start a dating site which is more emotional in nature … it is kind of surprising.