Network Effects in Data

Nick Carr’s difficulty in understanding my argument that cloud computing is likely to end up a low-margin business unless companies find some way to harness the network effects that are the heart of Web 2.0 made me realize that I use the term “network effects” somewhat differently, and not in the simplistic way many people understand it.

Here’s Nick:

Let’s stop here, and take a look at the big kahuna on the Net, Google, which O’Reilly lists as the first example of a business that has grown to dominance thanks to the network effect. Is the network effect really the main engine fueling Google’s dominance of the search market? I would argue that it certainly is not….

The intelligence embedded in a link is equally valuable to Google whether the person who wrote the link is a Google user or not. In his new post, in other words, O’Reilly is confusing “harnessing collective intelligence” with “getting better the more people use them.” They are not the same thing. The fact that my neighbor uses Google’s search engine, rather than Yahoo’s or Microsoft’s, does not increase the value of Google’s search engine to me, at least not in the way that my neighbor’s use of the telephone network or of Facebook would increase the value of those services to me. The network effect underpins and explains the value of the telephone network and Facebook; it does not underpin or explain the value of Google. (Indeed, if everyone other than myself stopped using Google’s search engine tomorrow, that would not decrease Google’s value to me as a user.)

Ah, I say to myself: Nick only sees first order network effects, what you might call endogamous networks, those that require the user to be part of the tribe. Thus, phone networks, and networks like Facebook. But the internet is an exogamous network; its benefits increase by the extent to which it reaches out to new groups, increases cross-breeding, and thus the total robustness and variety of the gene pool. This is why links matter, why web services matter, because they extend the reach of the network. Understanding the benefit of exogamous networks requires a more subtle calculus than Nick is applying. It’s not necessarily that you benefit directly from belonging, but the fact that you belong allows others to harvest the benefit of your participation.

Consider Google: The underlying network that Google is based on is one that they neither own nor control, the web itself. It has both endogamous end exogamous elements. No one controls it; its richness and diversity depends on that fact. And yet, there is a benefit to belonging. If there weren’t, sites would use their robots.txt file to tell Google and other search engines to stop spidering them.

Yes, you might say: but other search engines have access to that same network. And here, of course, is the first lesson: Google is better at spidering that network than their competitors. They thus benefit more powerfully from the network that we are all collectively building via our web publishing and cross-linking. Nick correctly points out that Google has built superior systems, and that these are the source of their competitive advantage. But that’s a diversion. Why did they build those superior systems? To harness the power that was hidden in the network more effectively than their competitors.

Google’s second network effect advantage is PageRank. As Robert Scoble so insightfully noted back in 2003, we contribute to Google with every link. Google realized that there was an additional layer of meaning hidden in the network. Far from being a contradiction to my network-effect hypothesis, as Nick claims, this is a validation of it. Advantage came to Google for seeing more deeply into the nature of the network, and building tools to harvest and apply data that was hidden in the network graph.

Google’s third (and most profitable) network effect insight was, of course, the ad auction. And once again, Nick misses the point. He says:

Now it’s true that, if you want to define market liquidity as a type of network effect, Google enjoys a strong network effect on the advertising side of its business (which is where it makes its money), but it would be a mistake to say that the advertising-side network effect has anything to do with Google’s dominance of the searches of web users.

It isn’t that the advertising-side network effect has anything to do with Google’s dominance of search, but rather, that Google’s dominance of search is central to the design of their ad auction. You see, while Yahoo! (nee Overture) sold keyword advertising to the highest bidder, Google realized that they could mine their users’ clickstream activity to predict which ads would be most likely to be clicked on, and by what ratio, and thus sell to the best combination of price and actual click through. Thus: higher revenue, more ability to invest in infrastructure, better results for advertisers and users, thus more users, thus better data, thus better results for both organic search and advertising (both of which do, in fact, matter to users, no matter what Nick thinks).

And of course, from there, you can also see other areas in which Google (and their competitors) are doing just this, from Google Docs and Spreadsheets (which exhibits the obvious kind of network effects that Nick is comfortable with), to mining clickstream data, to machine translation.

In short, Google is the ultimate network effects machine. “Harnessing collective intelligence” isn’t a different idea from network effects, as Nick argues. It is in fact the science of network effects – understanding and applying the implications of networks.

I want to emphasize one more point: the heart of my argument about Web 2.0 is that the network effects that matter today are network effects in data. My thought process (outlined in The Open Source Paradigm Shift and thenWhat is Web 2.0?, went something like this:

The consequence of IBM’s design of a personal computer made out of commodity, off- the-shelf parts was to drive attractive margins out of hardware and into software, via Clayton Christensen’s “law of conservation of attractive profits.” Hardware became a low margin business; software became a very high margin business.
Open source software and the standardized protocols of the Internet are doing the same thing to software. Margins will go down in software, but per the law of conservation of attractive profits, this means that they will go up somewhere else. Where?
The next layer of attractive profits will accrue to companies that build data-backed applications in which the data gets better the more people use the system. This is what I’ve called Web 2.0.

It’s network effects (perhaps more simply described as virtuous circles) in data that ultimately matter, not network effects per se. Nick probably wouldn’t think of Nuance as a network-effects driven company, but it is, because their applications and services depend on data that gets better the more people use it (or have their data harvested in one way or another.) More speech in more circumstances and more domains makes Nuance better for the next user. No user thinks that Nuance is better because of them (and because many of Nuance’s products are standalone, this is in fact true.) Yet Nuance, like Google, has figured out how to harvest data contributed by millions to build a better product.

Nick also took exception to my characterization of Wikipedia as a network-effects driven success:

I would also take issue with O’Reilly’s suggestion that Wikipedia’s success derives mainly from the network effect; Wikipedia doesn’t become any more valuable to me if my neighbor starts using it. Wikipedia’s success is probably better explained in terms of scale and scope advantages, and perhaps even its nonprofit status, than in terms of the network effect.

How wrong can you be? If there weren’t a network effect driving Wikipedia, Knol and Citizendium would be succeeding. Wikipedia got there first, to be sure, but they also built an infrastructure and a workflow and a philosophy that recognized that the collective of all users was smarter than any expert, and that barriers to participation would slow down improvement in the data. There isn’t a Facebook-like benefit in “belonging” to Wikipedia, but the application understands something that its competitors don’t about harnessing the network and its users to improve its data.

In short, Facebook is the obvious network effect case study. But we learn more by studying what is not obvious: the way internet sites and companies have derived competitive advantage by leveraging different kinds of network effects, most particularly (but not exclusively) to improve the data on which their services are built.

I’m making no claim, as Nick seems to think, that there are no other levers of competitive advantage in the internet era. Nor am I claiming that every network-effect business will be more successful than those that are not, precisely because there are other levers of competitive advantage, but also because some markets are more monetizable than others. But I am claiming that there will be significant differences in profitability between companies that find a network-effect sweet spot in a lucrative market, and those who embrace the commodity end of the business. And sorry, Nick, but I consider the cloud infrastructure business to be the commodity end of the business. It will look like the web hosting business, say, with a bunch of large, capital intensive providers, and not like the hugely profitable company extracting monopoly rents that Hugh Macleod (whose post The Cloud’s Best-Kept Secret triggered my own) envisions. Monopoly rents, if they occur, will be at higher levels in the cloud stack.