My "Outdated View" of the Semantic Web

I’ve gotten hammered in the comments on my post about freebase for suggesting that the semantic web was only about controlled ontologies.

James Hendler, the primary author of the OWL FAQ wrote:

I’m glad you’re beginning to grok what the Semantic Web is about, but I must take issue with your claim that “unlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomy approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions.”

If you look at what I’ve been writing since 2001 (in the Semantic Web article in Scientific American, coauthored w/Tim Berners-Lee and Ora Lassila) through my recent posts on the “Dark side of the Semantic Web” – I, and many others, have not been arguing for controlled ontologies – rather, we designed the Semantic Web technologies, and especially OWL, to encourage linking and reuse. We do believe there will be some carefully controlled ontologies in high value areas (such as the Cancer ontology which the national cancer institute maintains) but that much use would be by extension and linking to these….

With due respect, I think you, and even more egregiously Clay Shirky, have been misrepresenting what the Semantic Web is, and critiquing based on that misunderstanding, not on the reality.


Stefano Mazzocchi, whom I know through his work on Apache, wrote to me in email rather than in the comments (but gave me permission to republish here):

I read your blog post about Freebase with interest but I think you have an outdated understanding of the ideas around the semantic web and the current state of the art around it.

In my day job I work for the SIMILE project at MIT and we follow the an “emergent structure” approach in every aspect of our design.

Tools like Piggy Bank or Exhibit are based on RDF and the core design decisions of the semantic web, but they are completely (and explicitly so!) ontology agnostic.

Our wiki is powered by Semantic Mediawiki, a mediawiki plugin that allows us to deliver all sort of additional value to our web pages in form of RDD statements (with which we can create ‘meta’ pages that are a “view” of queries against the wiki-as-a-database (look at the wikitext of that page to understand what I’m talking about). [Note from Tim: you have to login to do so.

Semantic Mediawiki does NOT regulate your use of ontologies, you can just add some special wikitext to your data and it will generate RDF statements for you, then you can further augment that data to add ontologies on top.

I agree with you that most of the past research and development around the semantic web has been polluted with ontologies and rigid formalisms, but this is NOT inherent in the semantic web model. [It] is just a way to use it. TimBL’s own thoughts about ontologies in the semantic web clearly indicate that he does not believe in an ontologically-heavy use of the semantic web as a way to bootstrap it.

Also, you might want to take a look at dbpedia.

These comments are certainly an eye-opener. The simile projects are really cool — before this, I’d only looked at timeline. I’m still not completely convinced that my ideas about the semantic web are wrong, but I’ll certainly accept Stefano’s assertion that they are “outdated.” The view I’m castigated for does seem to be a big part of the semantic web’s history and culture. Even Jim’s “dark side” post (linked to above) refers to this:

[this is] the side of the Semantic Web that is looking at how do you use a small amount of Sem Web (think Foaf or Skos) to add a bit of organizational knowledge (and to webize with URIs) to tagging sites, microformats, and etc. It is the realization that the REST approach to the world is a wonderful way to use RDF and it is enpowered by the emerging standards of SPARQL, GRDDL, RDF/A and the like. In short, it is the Semantic Web vision of Tim’s, before Ora and I polluted it with all this ontology stuff, coming real!

In short, it sounds like the bottom-up approach to Web 2.0 and the current thinking on the Semantic Web are growing closer together every day.

Just to be clear, I’ve always loved the vision of the Semantic Web. But much of the early work at the W3C always seemed to me to be a case of premature standardization, which is why I’ve stayed away from it. I’m a big belief in the early IETF mantra, “No kings, no priests, just a rough consensus and running code.”

It’s always seemed to me that Web 2.0 as it was evolving would eventually turn into the Semantic Web, just that it was too early to specify the means by which it would do so. What’s at issue is not where we’re going, but what tools we will use to get there.

PageRank and all the other heuristics that Google has developed for relevance, to give one example, seem to me to be an example of a kind of implicit ontology that would never be developed or even modeled a priori by a W3C committee. Similarly, the tag structure that emerges from collective activity on Technorati or or Flickr would never be modeled as an ontology — but it could certainly perhaps be expressed by one after the fact.

What’s going to be really interesting is to see how the Semantic Web technologies develop now that we have actual, real life, messy use cases to work from, derived by people who don’t think about rigor, but just about the shortest path to the jam jar.