Sep 18

Tim O'Reilly

Tim O'Reilly

Economist Confused About the Semantic Web?

Jimmy Guterman made brief reference recently to an Economist story that described three companies as "semantic web companies." While flattered by the attention (all three are O'Reilly investments -- I am a personal investor in Rael Dornfest's ValuesOfN, creator of the Sandy application described in the story, and OATV is an investor in both Wesabe and Tripit), I am confused at hearing the semantic web characterization.

The article notes:

The semantic web is so called because it aspires to make the web readable by machines as well as humans, by adding special tags, technically known as metadata, to its pages. Whereas the web today provides links between documents which humans read and extract meaning from, the semantic web aims to provide computers with the means to extract useful information from data accessible on the internet, be it on web pages, in calendars or inside spreadsheets

The article then goes on to describe how RDF, OWL, and SparQL are the foundational technologies of the Semantic Web. I'm with them so far. But as far as I know, none of the three companies profiled are based on these Semantic Web technologies. In short, the Economist is using "the semantic web" as an equivalent to "this application seems able to do things that we used to think took a person to do."

That's not a bad concept, and it's certainly aligned with the vision of the semantic web. And ultimately, no one will care what kind of technology is under the covers.

But I want to return to some ideas I outlined in a previous post entitled Different approaches to the semantic web. There's no question in my mind that applications are becoming more intelligent. But there are lots of questions about the right way to get there. The Semantic Web (capitalized to refer to the suite of official technologies) is about developing languages, if you will, with which we can encode meaning into documents in such a way that they are more accessible to computers.

By contrast, I've argued that one of the core attributes of "web 2.0" (another ambiguous and widely misused term) is "collective intelligence." That is, the application is able to draw meaning and utility from data provided by the activity of its users, usually large numbers of users performing a very similar activity. So, for example, collaborative filtering applications like Amazon's "people who bought item this also bought" or last.fm's music recommendations, use specialized algorithms to match users with each other on the basis of their purchases or listening habits. There are many other examples: digg users voting up stories, or wikipedia's crowdsourced encyclopedia and news stories.

But for me, the paradigmatic example of Web 2.0 is Google's Pagerank. Not only did it lead to the biggest financial success story to date, it is the example that makes us think hardest about the true meaning of "collective intelligence." What Larry Page realized was that meaning was already being encoded unconsciously by web page creators when they linked one page to another. And that understanding that a link was a vote allowed Google to give better search results than people who, up to that time, were just searching the contents of the various documents on the web.

And so, it seems to me that Pagerank illustrates the fundamental difference between the approaches of the Semantic Web and Web 2.0. The Semantic Web sees meaning as something that needs to be added to documents so that computers can act intelligently about them. Web 2.0 seeks to discover the ways that meaning has already been implicitly encoded by the way people use documents and digital objects, and then to extract that meaning, often by statistical means by studying large aggregates of related documents.

Looking at it this way, you can see that Wesabe is very much a Web 2.0 company. Their fundamental insight is that the way that people spend money is a vote, just like a link is for Pagerank, and that you can use that aggregated vote to build various kinds of intelligent user-facing services.

valuesofN (Sandy) and Tripit are much less obviously web 2.0 in that sense. They provide services to an individual, based on his or her own data, with little or no "collective intelligence" benefit. They do, however, work to extract meaning from documents, rather than requiring that it be structured in some special way.

Sandy uses a "little language" so that email can be used to inform a bot to keep track of various kinds of information on your behalf. Tripit merely recognizes that certain types of documents -- in particular, reservation confirmations from airlines, hotels, and rental cars -- have clear structure from which meaning can be extracted. They then mash up additional relevant data such as maps and user notes to construct useful itinerary documents. These are both applications that are closer to the Semantic Web way of thinking, but again, no formal ontology was ever developed. The application developers simply realized that there was sufficient meaning in scope-limited interactions that it becomes possible to build a seemingly intelligent agent.

Where I think the two approaches meet, and will find common cause, is in the design of applications that don't require people to think at all about ontology or document structure, but nonetheless produce documents that are more amenable to the automated extraction of meaning. Social networking is a good example. Users are gradually building out the "Friend of a Friend" network that FoaF advocates dreamed of. It's missing a lot of the nuanced intelligence that was designed into the FoaF specification -- after all, what does it mean that someone is your friend on Facebook? By focusing on short-term utility (initially, getting to know your classmates in a college context), Facebook built something that was adopted far more quickly than the carefully thought out FoaF project.

The Semantic Web is a bit of a slog, with a lot of work required to build enough data for the applications to become useful. Web 2.0 applications often do a half-assed job of tackling the same problem, but because they harness self-interest, they typically gather much more data. And then solve for their deficiencies with statistics or other advantages of scale.

But I predict that we'll soon see a second wave of social networking sites, or upgrades to existing ones, that provide for the encoding of additional nuance. In addition, there will be specialized sites -- take Geni, for example, which encodes geneaology -- that will provide additional information about the relationships between people. Rather than there being a single specification capturing all the information about relationships between people, there will be many overlapping (and gapping) applications, and an opportunity for someone to aggregate the available information into something more meaningful.

I expect there to be a lot of confusion before things start to become clear. But I'm confident that in the end, Web 2.0 and the Semantic Web are going to meet in the middle and become best friends.

tags: web 2.0  | comments: 5   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/5856

Comments: 5

  J.O. Urban [10.29.07 01:58 PM]

The inherent attributes of semantic intelligence and coding also decreases its ability to be minipulated significantly. Although any system can eventually be minipulated with enough effort and resources behind the tackling of such a task it becomes increasingly difficult as the system evolves to use larger sets of data gathered from a wide range of samples.

  Curtis [11.18.07 06:06 AM]

Insightful comment on what makes Google the 'first' web 2.0 application.

You're right that self-interest encourages us to build the technology that we can monetize in shortest order.

Also, as you hinted at, nobody really cares what the web is built on only that they can use it. It's pretty cool that the users of the web span so many different interests from exploring to selling to mashing. It works for just about everybody on some level.

But, as I read more about this Semantic Web (from the academicians standpoint) it seems like the real panacea is to make the web work for absolutely everybody everywhere in whatever way they can use. Computer languanges like XML, RDP, or HTML don't grow the usefulness of the web; interest does. Since interest spans the gamut of the hierarchy of needs creating a web that meets all interests seems to be the final goal of a semantic web.

  P-Air [12.17.07 11:14 AM]

The marriage of Web 2.0 and the Semantic Web appears to be happening w/the incarnation of MetaWeb, where in effect people-power a la Wikipedia, is building up the semantics of objects, places & people. On heels of this effort, there's also Radar Networks' recent entry into this space w/Twine. While this latter example still has a ways more to go before it's ready for prime time, it does provide another example of this marriage between Web 2.0 and the Semantic Web. I've always thought that conceptually the Semantic Web was a great idea, but in practice it doesn't scale. Seeing the efforts of MetaWeb and Radar Networks, I'm now convinced that they are working around the poor scaling issues left unanswered by the Semantic Web proponents through some smart social engineering.

  Jim [06.02.08 06:17 PM]

'...meaning was already being encoded unconsciously..' This to me is where the Web becomes incredibly exciting. Meaning is being encoded unconsciously across the Web at a rapid pace creating a vast resource to be harvested in new and fascinating ways. I think we've only begun to understand the power of collective intelligence, and applying social context adds a whole new dimension to the equation.

  Derrick [06.04.08 01:55 AM]

There may be a misplaced tendency to regard the human brain as a 'memory bank'.

As I reach a mature age (in my 80th yr.) I contemplate the many things I knew that I no longer know. I wonder if they were more accurate -- or meaningful, than what I know now.

I,too. read that Economist article last year and wondered if I could retain, or reaccess the semantics that appealed to me in the past.

Clearly the semantics of the human brain could be a moving target. And that complicates any attempt to simulate semantics.

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.