How can we visualize the big players in the Web 2.0 data layer?

This post was originally published on John Battelle’s Searchblog (“Web 2 Map: The Data Layer – Visualizing the Big Players in the Internet Economy“).

As I wrote last month, I’m working with a team of folks to redesign the Web 2 Points of Control map along the lines of this year’s theme: “The Data Frame.” In the past few weeks I’ve been talking to scores of interesting people, including CEOs of data-driven start ups (TrialPay and Corda, for example), academics in the public data space, policy folks, and VCs. Along the way I’ve solidified my thinking about how best to visualize the “data layer” we’ll be adding to the map, and I wanted to bounce it off all of you. So here, in my best narrative voice, is what I’m thinking.

First, of course, some data.

Data layer chart

On the left hand side are eight major players in the Internet Economy, along with two categories of players that are critical, but which I’ve lumped together — payment players such as Visa, Amex, and Mastercard, and carriers or ISP players such as Comcast, AT&T, and Verizon.

I’ve given each company my own “finger in the air” score for seven major data categories, which are shown across the top (I don’t claim these are correct, rather, clay on the wheel for an ongoing dialog). The first six scores are in essence percentages, answering the question “What percentage of this company’s holdings are in this type of data?” The seventh, which I’ve called Wildcard data, is a 1-10 ranking of the potency of that company’s “wildcard” data that it’s not currently leveraging, but might in the future. I’ll get to more detail on each data category later.

Toward the far right, I’ve noted each company’s overall global uniques (from Doubleclick, for now, save the carriers and payment guys — I’ve proxied their size with the reach of Google). There is also an “engagement” score (again, more on that soon). The final score is a very rough tabulation computing engagement over uniques against the sum of the data scores. There are pivots to be built from this data around each of the scores for various types of data, but I’ll leave that for later. This is meant to be a relatively simple introduction to my rough thinking about the data layer. Hopefully, it’ll spark some input from you.

Now, before you rip it apart, which I fully invite (especially those of you who are data quants, because I am clearly not, and I am likely mixing some apples and watermelons here), allow me to continue to narrate what I’m trying to visualize.

As you know, the map is a metaphor, showing key territories as “points of control.” The companies I’ve highlighted in the chart all have “home territories” where they dominate a sector — Google in search, Facebook in social, Amazon and eBay in commerce, etc. What I plan to do is create a layer based on the data in the chart that, when activated, shows those companies’ relative size and strength.

But how?

Web 2.0 Summit, being held October 17-19 in San Francisco, will examine “The Data Frame” — focusing on the impact of data in today’s networked economy.

Save $300 on registration with the code RADAR

Well, the best idea we’ve come up with so far is to show each as a small city of sorts, where the relative height of the buildings is determined by a corresponding data point. So Twitter, for example, will have a tall building in the middle of its city, representing “Interest data.” Google’s tallest building will be search. Facebook’s will be social, and so on. And of course the cities can’t be all on the same scale, hence our use of total global uniques, and total engagement. Yahoo may be nearly as big as Facebook, but it doesn’t have nearly the engagement per user. So its city will be smaller, relatively, than Facebook’s.

Building preview What is interesting about this approach is that each company’s “cityscape” emerges as distinct. Microsoft’s is wide but not tall — they have a lot of data in a number of areas. It will probably end up looking like a suburban office park — funnily enough, that’s what Microsoft really looks like, for the most part. Amazon and eBay will have high towers of payment data, with a smattering of shorter buildings. And so on. I don’t have a good visualization of this yet, but the designers I’m working with at Blend have sketched out a very rough early version just so you can get the idea (see image to the right). The structures will be more whimsical, and of course be keyed with color. But I think you get the idea.

I’m even thinking of adding other features, like “openness” — i.e., can you access, gain copies of, share, and mash up the data controlled by each company? If so, the city won’t be walled. Apple, on the other hand, may well end up a walled city, with a moat, on top of a hill.

Now, a bit more detail on the data categories. You all gave me a lot of really good input on my earlier post, where I posited these original categories. But I’ve kept them the same, save the addition of the wildcard data. Why? Because I think each can be interpreted as larger buckets containing a lot of other data. I’ll go through each briefly in turn:

Purchase Data: This is information about who buys what, in essence. But it’s also who almost buys what (abandoned carts), when they buy, in what context, and so on.

Search Data: The original database of intentions — query data, path from query data, “intent” data, and tons more search signals.

Social Data: Social graph, but also identity data. Not to mention how people interact inside their graphs, etc.

Interest Data: This is data that describes what is generally called “the interest graph” — declarations of what people are interested in. It’s related to content, but it’s not just content consumption. It includes active production of interest datapoints, like tweets, status updates, checkins, etc.

Location Data: This is data about where people are, to be sure, but also data about how often we are there, and other correlated data — i.e., what apps we use in location context, who else is there and when, etc.

Content Data: Content is still a king in our world, and knowing patterns of content consumption is a powerful signal. This is data about who reads/watches/consumes what, when, and in what patterns.

Wildcard Data: This is data that is uncategorized, but could have huge implications. For example, Microsoft knows how people interact with their applications and OS. Microsoft and Google have a ton of language data (phonemes, etc.). Carriers see just about everything that passes across their servers, though their ability to use it might be regulated. Google, Yahoo and Microsoft have tons of email interaction data. And so on …

Now, of course all these data categories get more powerful as they are leveraged one against the other, and of course, I’ve left tons of really big data players off the map entirely (small startups like Tynt, Quora, or Sharethis have massive amounts of data, as do very large companies like Nielsen, Quantcast, etc.). But you have to make choices to make something like this work.

So, that’s where we are with the Web 2 Summit map data layer. Naturally, once the data layer is live, it will be driven by a database, so we can tweak the size and scope of the cities and buildings based on the collective intelligence of the map users’ feedback.

What do you think? What’s your input? We’ll be building this over the next two months, and I’d love your feedback before we get too far down the line. Thanks!

Related:

How can we visualize the big players in the Web 2.0 data layer?

An update on how a data layer will be added to the Web 2.0 Points of Control map.