What the data can tell us about dating and other social congregation

Valentine’s Day turned out to be a good time to discuss data crunching of online dating. Kevin Lewis, a PhD candidate in sociology and Berkman Center Fellow, drew an overflow room today for his talk Mate Choice in an Online Dating Site. It’s yet another example of how, as people go online, they leave a trail of data that could never be captured before.

Here are some examples how traditional researchers are restricted:

  • They can get marriage data, but have much less data about dating, cohabitation without marriage, and other non-traditional arrangements that are increasingly common. Dating sites let us in at a much earlier stage in a relationship that may or may not lead to marriage.

  • They can measure certain recorded demographics such as age and race, but miss a huge range of criteria by which people evaluate potential mates. People enter lots of interesting facts about themselves and their hoped-for mates on dating sites.

  • Because researchers miss the initial contacts, they have trouble tracing back from a result (marriage) to the criteria used by the dating couples.

As an example of the the last problem, Lewis mentioned the observation that people usually date and marry others with similar levels of formal education. Actually, researchers have long hypothesized that men don’t care much about women’s educational levels. They would be willing to date and marry outside their educational levels. It’s the women who care, and since they rule out men with much higher or lower educational levels, we end up with the current results.

Now Lewis can cite concrete data proving that hypothesis. On a dating site, men initiate and respond to contacts with women of many different levels. But the women don’t initiate many contacts outside their own level, and don’t respond to contacts from men outside that level.

How did Lewis conduct his research? Briefly, he persuaded OkCupid to give him a large data set stripped of free-text fields, but containing information on race, religion, and several other criteria. He chose data in the New York City area for heterosexual couples. Considering that 22% of heterosexual adults have found their current partners through online sites (the figure is even higher for same-sex couples: 61%), this is a lot of valuable data.

Of course, there are risks in extrapolating from this data set. Admittedly, OkCupid users tend to be younger and more Internet-savvy than the overall dating population. It’s hard to tell whether some criterion is truly a determining factor or a consequence of some other factor (for instance, educational level is correlated with age). Still, Lewis controlled for variables a good deal and feels there is a lot of statistical validity to his findings. He also admitted that the calculations put a big strain on servers at Harvard (here’s a person who needs to be on a grid!).

As just one other example, he documented a lot of contacts across racial lines, more than one might expect. But there were definite patterns. For instance, black women received a lot fewer contacts from other races than most groups. In this way, the data on dating gives us a look at our values in choices in other forms of social interaction, not just romance.

tags: , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.