Reuters CEO sees "semantic web" in its future

Money:Tech logo

At Money:Tech yesterday, I did an on-stage interview with Devin Wenig, the charismatic CEO-to-be of Reuters (following the still-not completed merger with Thomson). Devin highlighted what he considers two big trends hitting financial (and other professional) data:

The impact of consumer media on professional media. As young people who grew up on the web hit the trading floor, they aren’t going to be satisfied with text. Reuters needs to combine text, video, photos, internet and mobile, into a rich, interactive information flow. However, he doesn’t see direct competition from consumer media (including Google), arguing that professionals need richer, more curated information sources.
The end of benefits from decreasing the time it takes for news to hit the market. He describes the quest for zero latency in news, from the telegraph and early stock tickers and the news business that Reuters pioneered through today’s electronic trading systems. (Dale Dougherty wrote about this yesterday, in a story about the history of the Associated Press.) As we reach the end of that trend, with information disseminated instantly to the market via the internet, he increasingly sees Reuters’ job to be making connections, going from news to insight. He sees semantic markup to make it easier to follow paths of meaning through the data as an important part of Reuters’ future.

Devin’s point about the semantic web was thought-provoking. Ultimately, Reuters’ news is the raw material for analysis and application by investors and downstream news organizations. Adding metadata to make that job of analysis easier for those building additional value on top of your product is a really interesting way to view the publishing opportunity. If you don’t think of what you produce as the “final product” but rather as a step in an information pipeline, what do you do differently to add value for downstream consumers? In Reuters’ case, Devin thinks you add hooks to make your information more programmable. This is a really important insight, and one I’m going to be chewing on for some time.

That’s a really good case for the Semantic Web, and one that I hadn’t understood before. It’s not about having end users add semantics for the love of it. That’s just overhead, which is why I’ve always argued against it, preferring the kind of implicit semantics that come from applications that harness user self-interest. But professional publishers definitely have an incentive to add semantics if their ultimate consumer is not just reading what they produce, but processing it in increasingly sophisticated ways.

But even if Devin is right about one role of a publisher being to add value via metadata, I don’t think he should discount the statistical, computer-aided curation that has proven so powerful on the consumer internet. (Curation is that part of the publisher’s job that consists of choosing and arranging the content that is presented to the ultimate consumer — reading the slushpile, if you will, so that others don’t have to, and making sure that the most important material gets its day in the sun.)

Explicit semantic markup has thus far not proven to be anywhere near as powerful as techniques for mining implicit semantics, or the design of applications in which more implicit semantics are created by users simply by “living as and where they live.” (Facebook’s “social graph” is the latest example of this kind of implicit semantic application.) Much success on the consumer internet has resulted from innovations in curation. After all, PageRank is a kind of automated curation via collective intelligence, as is Flickr’s interestingness algorithm, user voting on slashdot and digg stories, and even community editing of Wikipedia.

Devin is completely right, though, about consumer media changing expectations for professional media. I see a lot of future upset in enterprise software as well as in media, as consumer phenomena like mashups and social networking change ease-of-use expectations of applications like CRM and business reporting. But it seems to me that mainstream media needs to learn not just about multimedia, but also about new sources of information.

A huge part of the generational change is a change in expectations of transparency, informality, and sources of authority. So when Devin says that Google isn’t terribly useful for professional uses like financial research, I think he misses just how much authority bloggers are getting as reliable news sources, and how people are using tools like iGoogle to pull together targeted RSS data feeds. Raw Google results may be less useful than Reuters-filtered results, but how about community or expert-curated Google results? Just as Reuters’ customers are adding value to the Reuters data stream, they are capable of adding value to the Google data stream. And there are increasingly powerful tools for managing that stream.

What I think does ultimately matter is the ability of professional media to build specialized interfaces and vertical data stores that are suited to their niche, hopefully harnessing data and services from the consumer internet, and mashing them up with specialized, perhaps private, data stores. Put that together with metadata for programmable re-use, and you may really have something.

On the end of timing arbitrage due to zero latency in information distribution, I have to disagree with Devin. There’s still a huge amount of information that never hits the market because people aren’t paying attention to the right things. David Leinweber, who spoke at the conference yesterday, gave a great example, of a huge price move in the stock of a pharmaceutical company as a result of news of successful clinical trials for a cancer vaccine. Every single news story on the subject resulted from the company’s press release, yet there was information available on the web months in advance of the story. Leinwebber’s point: it isn’t about speeding up news distribution, but about getting better access to sources of the news. As he said: “You have to get the news before the news people get there.”

Even in areas like company financials, the bread and butter of equity analysis, is there any reason why earnings are still reported only quarterly? There may one day be a company that breaks ranks, and shows its data in real-time (as it is increasingly coming to company executives.) The expectation of radical transparency may one day reach even into this relic of the 19th century.

I also think there’s a huge opportunity to get to data sooner via the sensor revolution. When phones report location (disclosure: O’Reilly AlphaTech Ventures investment), when phones listen to ambient sound, when credit cards report spending patterns (disclosure: OATV investment), when cars report their miles traveled, when we’re increasingly turning every device into a sensor for the global brain, there will be more and more sources of data to be mined.

And yes, we’ll need humans (at least for a while) as the last mile to extract meaning from all that data. But we’ll want those humans to be augmented with tools that notice patterns and exceptions as they happen, not just after the “news” hits.