Why Uber's data fascinates a neuroscientist

The unique relationship between a brain expert and a car-sharing service.

Uber logoMatching cars for hire with people who want to get places may not be rocket science. But a background in neuroscience couldn’t hurt. Where 2012 speaker Bradley Voytek (@bradleyvoytek) has taken his experience as a neuroscience researcher to buzzy car-service company Uber, where he sees similarities between the connections in an urban landscape and those in the brain. Voytek’s role with Uber involves figuring out how to make sense of and how to apply the massive amounts of data that the car service and its customers generate. Wrangling that data can help Uber match up cars and passengers more quickly — and has some other promising possibilities, too.

What are you learning from Uber’s real-time analytics?

Bradley Voytek: A lot of people think that Uber is just a car service, and that we figure out where to pick people up and where to take them. But as a cognitive neuroscientist, of course I’m interested in human behavior. To me, the coolest thing about what we can learn when we take a deeper look at Uber’s data is how people move around a city. We get a little glimpse at how people flow, what neighborhoods are connected, where people go to party on weekend nights. It’s fascinating.

What are some of the things you might do with that data?

Bradley Voytek: One of the great things about working for a startup like Uber is spit-balling ideas, talking about possibilities. While I can’t talk specifically about what else we might be able to do for our users using their data in terms of the business, you could easily imagine a lot of possibilities. My personal favorite “out there” idea is to ask drivers if they’d be willing to have a multi-sensor attached to their car that sampled air quality, temperature, and other environmental factors. We could make it so our driver partners are involved in “citizen science” in a sense.

How do you make the connection between incoming data, analysis, and business response?

Bradley Voytek: Managing supply is a critical issue. If we have a lot of users wanting a car, but we don’t have enough cars on the system, then our wait times increase and the chance of any one user getting a car decreases. This provides a less optimal user experience. So, if we start to see an unexpected increase in demand, we can have our ops team start calling to get more drivers on the system, for example. When we launch in a new city, we need to know where people want us. So, we take a look at where people have been checking us out and look for hotspots of anticipatory activity in a city to make sure we’re addressing our eager riders.

Where Conference 2012 — O’Reilly’s Where Conference, being held April 2-4 in San Francisco, is where the people working on and using location technologies explore emerging trends in software development, tools, business strategies and marketing.

Save 20% on registration with the code RADAR20

Have you opened up Uber’s data or compared it with other geo databases to learn new things?

Bradley Voytek: We anonymized some of our data a few months ago for a data visualization competition to see what people could do with it, but so far we haven’t shared our data too broadly. I’ve been working hard to mash up our data with other public data though, to see if I can uncover something cool. For example, a few months back I wanted to see if I could predict which neighborhoods in San Francisco had the most rides based on some demographic information. At first I thought the obvious answer would be population density, but what we see is that people don’t always take a car from their home, they take it for business or pleasure, from one social location to another. So, I used public crime data as a surrogate measure for the amount of “activity” in a neighborhood, and that predicted rides much better than population density.

How do you think your neuroscience background shapes the way you do your data work at Uber?

Bradley VoytekBradley Voytek: It’s a two-way street. My neuroscience background has influenced the way I think about and work with data: I come from an electrophysiological background. I work with time-series data, so I tend to think about our data in terms of how metrics change over time. I think about cities as a series of connected nodes in a city-wide network, which is analogous to how I think about the brain: a complex network of connected neuronal hubs.

But I’ve also taken some of the visualization and analytic techniques I learned at Uber back into my neuroscience research, and I’ve even begun looking at geolocation data in some of my side projects. Specifically, my wife and I created a website with some help from my friend and Uber’s head of engineering, Curtis Chambers, called brainscanr.com. This paper is currently under peer review, but the main idea was to see if we could “map” relationships between neuroscientific topics spread across more than three million peer-reviewed publications. It’s an incredibly complex field, spanning psychology, biology, chemistry, medicine, computer science and artificial intelligence, and so on. There’s too much data. As I learned new tricks about data visualization and graph theory at Uber, I was able to go back to this project and improve on it. We’re trying to do two things: First, aggregate all of these disparate scientific findings into something more digestible (which is at the heart of big data and data visualization), and second, see if we can’t learn anything new from these data (again, a core part of big data). So, instead of just visualizing relationships between topics, we’re looking at the statistical properties of those 500,000 connections to try and find places where (statistically) connections should exist, but do not. I’m calling this “semi-automated hypothesis generation.”

You ran out of time during your Ignite presentation while you were discussing an idea about correlating reaction times (of a brain task, via your work with Lumosity) with automobile accidents. Can you tell us what you found?

Bradley Voytek: Okay, this is very preliminary, but obviously exciting. The data I’m looking at with Lumosity measures attention and cognitive control. They’ve shared tens of thousands of users’ worth of data with me from all over the world. After learning geolocation at Uber, I began to think about what kinds of location-based questions I could answer with the Lumosity data. Given that we’re looking at attention and cognitive control, I thought maybe less "attentive" states would have a slightly increased risk of car accidents.

And that’s what I’m finding, but with the huge caveats that these data are preliminary, not peer-reviewed, and, of course, complex in that there are a lot of potential factors that may explain the apparent relationship between the Lumos attention measure and car accidents. As I’ve gathered more state-level data, you start to see interesting correlations among factors like state, age, income, health — all with the usual caveats about interpreting correlations as causation.

Voytek’s Ignite presentation is available in the following video.

tags: , ,