What lies ahead: Data

Tim O’Reilly recently offered his thoughts and predictions for a variety of topics we cover regularly on Radar. I’ll be posting highlights from our conversation throughout the week. — Mac

Are companies catching on to the importance of data?

Tim O’Reilly: For a long time, data was a secret hiding in plain sight. It became clear to me quite a while ago that it was the key to competitive advantage in the Internet era. That was one of the points in my Web 2.0 paper back in 2005. It’s pretty clear that everybody knows about it now. There are “chief data scientists” at companies like LinkedIn and Bit.ly. Data and algorithms are at the heart of what so many companies are doing, and that’s just going to accelerate.

There’s more data every day, and we’re going to see creative applications that use data in new ways. Take Square, for example. It’s a payment company that’s hoping to do some degree-of-risk mitigation via social network analysis. Will that really work? Who knows? Even Google is struggling with the limits of algorithmic curation.

There was a story in the New York Times recently about a guy who figured out that getting lots of negative comments led to a high Page Rank. That raises new questions. Google doesn’t like to do manual intervention, so they’re saying to themselves, “How can we correct this algorithmically?” [Note: Google responded with an algorithmic solution.]

I’m not privy to what’s happening inside Google’s search quality team, but I think there are probably new sources of data that Google could mine to improve results. I’ve been urging Google to partner with people who have sources of data that aren’t scrapeable.

Along those lines, I think more data cooperation is in the future. There are things multiple companies can accomplish working together that they couldn’t do alone. It occurs to me that the era of Google was the era in which people didn’t realize how valuable data was. A lot of data was there for the taking. That’s not true anymore. There are sources of data that are now guarded.

Facebook, as an example, is not going to just let a company like Google take its data to improve Google’s own results. Facebook has its own uses for that data. Meanwhile, Google, which was allowing Facebook users to extract their Gmail contacts to seed their Facebook friend lists, responded by setting their own limits. That’s why I see more data-sharing agreements in the future, and more data licensing.

The contacts battle between Google and Facebook is an early example of the new calculus of data. I don’t know why Google didn’t stand up sooner and say, “If you’re scraping our data to fill out your network, why can’t we do the same in reverse?” That’s a really good question. It’s one thing if Facebook wants to keep their data private. But it’s another thing if they take from the “data commons” and don’t give anything back.

I also anticipate big open data movements. These will be different from the politically or religiously motivated open data movements of the past. The Google/Facebook conflict is an example of an open data battle that’s not motivated by religion or principle. It’s motivated by utility.

Strata: Making Data Work, being held Feb. 1-3, 2011 in Santa Clara, Calif., will focus on the business and practice of data. The conference will provide three days of training, breakout sessions, and plenary discussions — along with an Executive Summit, a Sponsor Pavilion, and other events showcasing the new data ecosystem.

Save 30% off registration with the code STR11RAD

How will an influx of data change business analytics?

Tim O’Reilly: Jeff Hawkins says the brain is a prediction engine. The reason you stumble if a step isn’t where you expect it to be is because your brain has performed a prediction and you’re acting on that prediction.

Online services are becoming intelligent in similar ways. For example, Google’s original competitive advantage in advertising came from their ability to predict which ads were the most likely to be clicked on. People don’t really grasp the significance of that. Google had a better prediction engine, which means they were smarter. Having a better prediction engine is literally, in some sense, the definition of being smarter. You have a better map of what’s true than the next guy.

The old prediction engine was built on business intelligence; analytics and reports that people study. The new prediction engine is reflex. It’s autonomic. The new engine is at work when Google is running a real-time auction, figuring out which ad is going to appear and which one is going to give them the most money. The engine is present when someone on Wall Street is building real-time bid/ask algorithms to identify who they’re going to sell shares to. These examples are built on predictive analytics that are managed automatically by a machine, not by a person studying a report.

Predictive analytics is an area that’s worth thinking about in the years ahead: how it works, how you become proficient at it, how it can transform fields, and how it might conflict with existing business models.

Healthcare offers an example of the potential conflicts. There is no question in my mind that there’s a huge opportunity for predictive analytics in healthcare and in the promise of personalized medicine. Certain therapies work better than others, and those conclusions are in the data, but we don’t reimburse based on what works. Imagine if Medicare worked like Google. They would say, “You can use any medicine you want, but we’re going to reimburse at the rate of the lowest one.” Pretty soon, the doctors would be using the drugs that cost the least. That’s opposed to what we have now, where doctors get drugs pushed on them by drug companies. Doctors end up recommending particular therapies that the data says don’t work any better, but cost three times as much. The business models of pharmaceutical companies are all dependent on this market aberration. If we moved to a predictive analytics regime, we would actually cut a lot of cost and make the system work better. But how do you get there?

Predictive analytics will chip away at new areas and create opportunities. A great example is a company we had on stage at Gov 2.0 Summit, PASSUR Aerospace, that has been managing predictive analytics for airline on-time arrival. For 10 years or so, they’ve been tracking every commercial airline flight in the U.S. and correlating it with data like weather and other events. They’re better than the airlines or the FAA at predicting when the planes will actually arrive. A number of airlines have hired them to help manage expectations about flight arrivals.

Mobile sensors have come up in many of your recent talks. Why are sensors important?

Tim O’Reilly: Recently I was talking with Bryce Roberts about how sensors connect a bunch of OATV’s investments. Path Intelligence is using cell phone check-ins to count people in shopping malls and other locations. Foursquare uses sensors to check you in. RunKeeper, which tracks when you run, is another sensor-based application.

The idea that the smart phone is a mobile sensor platform is absolutely central to my thinking about the future. And it should be central to everyone’s thinking, in my opinion, because the way that we learn to use the sensors in our phones and other devices is going to be one of the areas where breakthroughs will happen.

(Note: Tim will share more thoughts on mobile sensors and the opportunities they create in a post coming later this week.)

Next in this series: What lies ahead in publishing

Related:

What lies ahead: Data

Tim O'Reilly on the calculus of data, predictive analytics, and why mobile sensors are central to his thinking about the future.

Are companies catching on to the importance of data?

How will an influx of data change business analytics?

Mobile sensors have come up in many of your recent talks. Why are sensors important?