My "Outdated View" of the Semantic Web

Sat

Mar 10
2007

listen

My "Outdated View" of the Semantic Web

I've gotten hammered in the comments on my post about freebase for suggesting that the semantic web was only about controlled ontologies.

James Hendler, the primary author of the OWL FAQ wrote:

I'm glad you're beginning to grok what the Semantic Web is about, but I must take issue with your claim that "unlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomy approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions."
If you look at what I've been writing since 2001 (in the Semantic Web article in Scientific American, coauthored w/Tim Berners-Lee and Ora Lassila) through my recent posts on the "Dark side of the Semantic Web" - I, and many others, have not been arguing for controlled ontologies - rather, we designed the Semantic Web technologies, and especially OWL, to encourage linking and reuse. We do believe there will be some carefully controlled ontologies in high value areas (such as the Cancer ontology which the national cancer institute maintains) but that much use would be by extension and linking to these....

With due respect, I think you, and even more egregiously Clay Shirky, have been misrepresenting what the Semantic Web is, and critiquing based on that misunderstanding, not on the reality.

Ouch!

Stefano Mazzocchi, whom I know through his work on Apache, wrote to me in email rather than in the comments (but gave me permission to republish here):

I read your blog post about Freebase with interest but I think you have an outdated understanding of the ideas around the semantic web and the current state of the art around it.
In my day job I work for the SIMILE project at MIT and we follow the an "emergent structure" approach in every aspect of our design.

Tools like Piggy Bank or Exhibit are based on RDF and the core design decisions of the semantic web, but they are completely (and explicitly so!) ontology agnostic.

Our wiki is powered by Semantic Mediawiki, a mediawiki plugin that allows us to deliver all sort of additional value to our web pages in form of RDD statements (with which we can create 'meta' pages that are a "view" of queries against the wiki-as-a-database (look at the wikitext of that page to understand what I'm talking about). [Note from Tim: you have to login to do so.

Semantic Mediawiki does NOT regulate your use of ontologies, you can just add some special wikitext to your data and it will generate RDF statements for you, then you can further augment that data to add ontologies on top.

I agree with you that most of the past research and development around the semantic web has been polluted with ontologies and rigid formalisms, but this is NOT inherent in the semantic web model. [It] is just a way to use it. TimBL's own thoughts about ontologies in the semantic web clearly indicate that he does not believe in an ontologically-heavy use of the semantic web as a way to bootstrap it.

Also, you might want to take a look at dbpedia.

These comments are certainly an eye-opener. The simile projects are really cool -- before this, I'd only looked at timeline. I'm still not completely convinced that my ideas about the semantic web are wrong, but I'll certainly accept Stefano's assertion that they are "outdated." The view I'm castigated for does seem to be a big part of the semantic web's history and culture. Even Jim's "dark side" post (linked to above) refers to this:

[this is] the side of the Semantic Web that is looking at how do you use a small amount of Sem Web (think Foaf or Skos) to add a bit of organizational knowledge (and to webize with URIs) to tagging sites, microformats, and etc. It is the realization that the REST approach to the world is a wonderful way to use RDF and it is enpowered by the emerging standards of SPARQL, GRDDL, RDF/A and the like. In short, it is the Semantic Web vision of Tim's, before Ora and I polluted it with all this ontology stuff, coming real!

In short, it sounds like the bottom-up approach to Web 2.0 and the current thinking on the Semantic Web are growing closer together every day.

Just to be clear, I've always loved the vision of the Semantic Web. But much of the early work at the W3C always seemed to me to be a case of premature standardization, which is why I've stayed away from it. I'm a big belief in the early IETF mantra, "No kings, no priests, just a rough consensus and running code."

It's always seemed to me that Web 2.0 as it was evolving would eventually turn into the Semantic Web, just that it was too early to specify the means by which it would do so. What's at issue is not where we're going, but what tools we will use to get there.

PageRank and all the other heuristics that Google has developed for relevance, to give one example, seem to me to be an example of a kind of implicit ontology that would never be developed or even modeled a priori by a W3C committee. Similarly, the tag structure that emerges from collective activity on Technorati or Del.icio.us or Flickr would never be modeled as an ontology -- but it could certainly perhaps be expressed by one after the fact.

What's going to be really interesting is to see how the Semantic Web technologies develop now that we have actual, real life, messy use cases to work from, derived by people who don't think about rigor, but just about the shortest path to the jam jar.

tags: web 2.0 | comments: 14 | Sphere It
submit:

Previous | Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/5323

Comments: 14

Pierce Prelsey [03.10.07 08:54 AM]

It just goes to show how hard it is to keep up with all these emerging and evolving technologies (for lack of a better term) when one of the people we mere mortals look to for guidance gets a case of definition lag like this.
Was it inevitable that Tim get caught like this, if not in the Semantic Web than in, perhaps, the latest developments in ActionScript? He is, of course, only human (we surmise). And as such, he can't possibly be on top of everything: like the universe, human knowledge, even in the relatively narrow field of computers and the Internet, is expanding rapidly.
At what point does the (necessary) restriction of our expertise make everything outside of it magic a la Arthur C. Clark's observation that sufficiently advanced technology appears as such?

monopole [03.10.07 11:39 AM]

Actually Clarke's first two laws apply better:

1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.

Simply put, the Semantic Web will come when people need it bad enough, and it will look an awful lot more like HTTP than Xanadu.

Ontologies suffer from two essential problems:
1. They are top down, requiring imposition from a central authority, Much more Cathedral than Bazzar. (actually that last sentence is pretty ontology proof but perfectly suited to folksonomy)

2. They suffer from Sapir Worf Hypothesis in that they constrain knowledge to what is known, as opposed to the introduction of neologisms and new flexible categories.

steve [03.10.07 08:50 PM]

The semantic web reminds me of early AI... the people castigating you for your earlier understanding are the same people shifting the grounds of what the Semantic Web was supposed to be about.

There is a reason why someone as brilliant as Danny Hillis is releasing his project in that form, and it's because it's the most practical step that he can work out. If that is not the Semantic Web, then that is just too bad. Life is supposed to be sunshine, cuddly puppies and cute little butterflies. Defining that as Semantic Life doesn't bring it any closer, so we have to work step by step towards improving things.

I've been talking to Semantic Web people for the last 10 years, and none of them have impressed me as having any real idea of how to make things work... even though some of them are as verbally impressive and conversationally as sharp as Marvin Minsky.

Unfortunately, rapier wit and snappy repartee does not make a scientific model, but it can be very difficult to tell the difference between someone who sounds like they know what they're talking about and someone who does know what they're talking about.

So I welcome the work that Danny Hillis is doing, and mostly ignore those who talk in definitions and what it should be. Progress is measured in what is and what happens, not what should and what might.

Danny [03.11.07 04:40 AM]

Re. "In short, it sounds like the bottom-up approach to Web 2.0 and the current thinking on the Semantic Web are growing closer together every day."

That still implies the Web 2.0 movement initiated the "bottom-up" style - but it's there in Tim Berners-Lee's design notes from long before the "2.0" was conceived.

Re. "...much of the early work at the W3C always seemed to me to be a case of premature standardization..."

Possibly, but bear in mind that what the Semantic Web languages set out to achieve is essentially webizing what people have been doing with programs and data all along. There are strong similarities between e.g. object-oriented programming, entity-relationship modelling, relational databases and the RDF/OWL model. What's new is the web.

It's also worth remembering that not all development is "paving the cowpaths". There wasn't a web before HTTP, the web standards are qualitatively different than their predecessors like Gopher.

Re.
[[
What's going to be really interesting is to see how the Semantic Web technologies develop now that we have actual, real life, messy use cases to work from, derived by people who don't think about rigor, but just about the shortest path to the jam jar.
]]
Absolutely. The Semantic Web has historically been subject to a chicken/egg problem. But now we have the tools and specifications on one hand (thanks to the W3C and avant garde hackers), and increasingly the applications and data on the other (thanks to Web 2.0).

I've written my own prognostications in articles for the IEEE (PDFs): The Shortest Path to the Future Web and From Here to There.

Tim O'Reilly [03.11.07 08:50 AM]

I totally agree that Web 2.0 is really just Tim's original Web 1.0 by another name. It was Web 1.5, which thought that the web was some kind of analogue to television, that got it wrong. I say this whenever I get a chance because I've come to understand that a some people seem to be taking "Web 2.0" as somehow deprecating Tim's original vision. (I expand on this point in this recent IBM developerworks interview with me.)

That being said, I do think that Web 2.0 is the Semantic Web as it actually is happening, rather than as it was imagined and specified. PageRank is such a great example. There was no ontology at all -- fixed OR extensible -- just a realization that there was already meaning to be extracted from the data that was already there.

Perhaps one way to express the difference (I'm still thinking whether it's accurate or not) might be to say that the Semantic Web theorists have approached the problem by trying to build structures that will allow machines to make deductions, while what's actually happened is that Web 2.0 entrepreneurs have figured out how to help machines do inductive reasoning instead.

Kingsley Idehen [03.11.07 12:18 PM]

Tim,

Web 1.0 - Web of Hypertext (blurb distributed over HTTP and encoded in HTML). The "V" (Viewer or Interactive Web).

Web 2.0 - API (SOAP, REST, XML-RPC) Driven Web of Services (or Services Web). The "C" (Controller or Application logic).

Web 3.0 - Data Model driven Web oriented towards granular access to Web Data. The "M" that completes the M-V-C pattern as applied to the Web Platform. The Web as a Database by way of the RDF Data Model (which has RDF/XML, RDF/N3, RDF/Turtle, RDF/TriX etc. as instance data interchange/serialization formats).

Structured Blurb is not Granular Representation of Data. This is where XML and XPath/XQuery blurr matters re. the Data Web and its RDF Model (which is mistaken most of the time as being RDF/XML). The Web of Data is a Web of RDF Data Sources as opposed to XML Data Sources (*The source of much confusion*) -- basically the difference between a semi-structured (X)HTML page containing the fortune 400 and a structured data source that provides granular access to data about the group or each member.

The RDF Model is an open and standardized way of producing time-tested Entity Data Model when all is said and done.

The house of folksonomies called del.icio.us is actually a bona fide Data Web oriented Data Source (or Data Space). The same applies to Flickr, Googlebase, eBay, Amazon, and many others..

Sample Links:

Note: I am demonstrating folksonomies in the context of RDF instance data (supposedly mutually exclusive by way of misrepresentation that has unfortunately crysrtallized over the years).

In both cases, click on the enhanced hyperlinks in the "O" (object) column and choose the Dereference (Get Metadata) option. This is what traversal of Web Data links (typed as opposed to untyped links) is all about. This is why the Data Web (Web 3.0) is basically an unravelling of the foundation layer of the Semantic Web (meaning there will be other dimensions of Web Interaction i.e. 4.0...N over time). Whether we change names or numbers, the fundamental concept of a "Semantic Web" as originally envisioned in 1998 simply will not go away. The truth lies in the data (across every blog post, comment, discussion, wikiword, picture, tagbase etc..), and the data will simply become more meaningful and accessible over time.

Nova Spivack [03.11.07 07:26 PM]

Hey Tim, my company Radar Networks http://www.radarnetworks.com -- is a venture-funded stealth company working on a big consumer implementation of the semantic web. We've been working on our platform for several years, first on our own, then with SRI/DARPA, and now we're getting close to a beta of our first app. You might also find what I've been blogging about to be of interest -- I've been writing about the future of the semantic web: http://www.mindingtheplanet.net/

Lew Tucker is our CTO (creator of salesforce AppExchange). I think you know him.

We invite you to drop by sometime to see what we're doing.

Best regards,

Nova Spivack

Laurens Holst [03.11.07 10:11 PM]

The problem with a lack of standardisation in model and publication format (RDF) is that you have to concoct a new method to retrieve and process the information everytime you want to pull in a new source.

Plus that the RDF model is a really nice one, too, Iíd like to see graph-based RDF stores replace relational databases sooner rather than later :).

And the idea that all standardisation should happen after-the-fact or itíll be premature doesnít make much sense to me. Sure, thatís how it happened in certain cases, and it might have been the best way for those cases, but it seems like a big oversimplification to adopt that as a general rule. Itís easy to identify several drawbacks and positives in either approach.

~Grauw

Jeff Pollock [03.11.07 10:20 PM]

Hi Tim-

I'm currently with Oracle and we have a number of products that rely on and enable Semantic Web specified languages.

Your quote, "...I do think that Web 2.0 is the Semantic Web as it actually is happening, rather than as it was imagined and specified," still misses the point somehow.

In fact, the Semantic Web is happening, and it's happening to specification (RDF & OWL). And yes, it's also true that more specifications are still needed.

The inaccurate myths characterizing the Semantic Web as (a) top-down or (b) simple deductive logic trivializes a much more foundational truth.

Structured information is of high value. People have always and will always need and find value in structured information. The Semantic Web is simply a flexible and expressive and federated way to structure information.

For the Web 2.0 folks, I'm just happy that the Semantic Web will always be backwards compatible with folksonomies etc.

The SDForum is hosting a panel on this topic this coming wednesday:
http://web2express.org/openlab/2007/02/27/the-tortoise-vs-the-hare-ontology-vs-folksonomy

Best Regards, -Jeff-

Jim Hendler [03.12.07 04:41 AM]

Tim - To be fair to you, I do think there are definitely reasons why the perceptions and the reality of SW have gotten somewhat out of line - and there are certainly some major players (incl. some of my own researchers at the MIND Lab) who do think the "high end" ontologies coupled with reasoners are important to the eventual success of the Semantic Web (and I can appreciate their arguments) - my Dark Side commentary was aimed at this group, so terms like "polluted" are perhaps stronger than they might be if writing to the wider audience.

To me, the important aspect of the new work is that the "webizing" of the AI work allows us to let it play nice with other technologies in a way it never could before -- at the risk of once again pushing my own work, Ora Lassila and I gave a talk at the Semantics 2006 conference (slides from
here in which we tried to show how the SW emerges from both Web and AI sides and the importance of these growing together - we also noted the synergy of Web 2.0 and SW.

I hope sometime we overlap in a forum where we get a chance for direct exchange of these views (we've been at a couple meetings where we talked on different days) and it's great to see that these somewhat separate communities are realizing that the wonderful thing about the Web (and the true strength of TimBL's original vision) is that all of these things can coexist and link together - creating a world where everyone profits from anyone's good ideas...

Frederick Giasson [03.12.07 06:07 AM]

Hi Mr. O'Reilly,

Just my two pennies in this really interesting discussion.... when you say:

"That being said, I do think that Web 2.0 is the Semantic Web as it actually is happening, rather than as it was imagined and specified. PageRank is such a great example. There was no ontology at all -- fixed OR extensible -- just a realization that there was already meaning to be extracted from the data that was already there."

Check where the semantic web will really came from... in fact, it will happens from something that everybody knows for about 35 years: relational databases.

In fact, all the data is there, waiting to be used by the semantic web. The popular thinking is: the semantic web will have the possibility to happens when it will reaches some thresholds of data, then people will start to develop applications... etc... etc... etc...

In my humble opinion: Wrong

It will probably happens once we will have converted enough data in RDF, and when that data will be linked together. Check what we are doing with dbpedia, musicbrainz, census data, etc. In fact, the Linked-Open-Data community is doing exactly what freebase does, except that it is converting everything in RDF.

What that mean? It means that there will be replication of freebase DB everywhere on the Web. It means that advanced search tools will be able to easily query that dump of data. People will be able to describe things in these DB systems from anywhere on the web (their personal web page, from some portal, etc.). The data will be manipulated by anyone from anywhere in an un-centralized way (opposed to systems such as wikipedia, freebase, google base, etc.) See it as a distributed Wikipedia...

Well, you will tell me: great ideas, but when it will happens? I would say: soon enough, but I would suggest you to follow SemWeb blogs and mailing lists to assets it by yourself (planetrdf.com is a good start).

Have a great day,

Take care,

Fred

Alex Hammer [03.12.07 11:38 AM]

I enjoyed Esther Dyson's post on Huffingtonpost.com today about Metaweb. Very detailed and incisive (easy to see why she's an icon).

Craig Hubley [03.18.07 02:36 PM]

What everyone seems to forget here is that all of the pre-1990 hypertext systems supported typed (or "semantic") links. What was unique about the WWW and HTML was that it had no explicit semantic lnks (only the hook for them, "REL" and "REV"). From the moment the first WWW browser (lynx) was running (I was one of the first dozen users), we were all arguing about how to add semantic links.

By 1995 I was arguing that "the separate, non-URL, name space and binding mechanism for link types" had probably killed them off and that there was no alternative but allowing link types to be looked up online, with only a few hardwired defaults, much as plugins work. This was in fact the way RDF eventually evolved. However, what RDF lost was the low-overhead bottom-up loose-typing we were assuming. I don't see much viability in a strong ontology imposed from above. Never did.

So, almost twelve years later, what do I think? This:

1. no matter how many types of links you support there will always be a lot of data just written in natural language without explicit formal link types - few people will ever learn such notations unless/until someday they are taught in grade one

2. natural language works because semantics (in the linguists' sense) works - because we know what most links are for, based on what they say; this works better if we avoid anchor text and insist on naming pages with phrases we actually use in sentences (no capital letters except on proper nouns and acronyms also implied);

2a. mediawiki monoculture is the future because it has critical mass, but it got that critical mass because it respected English conventions and eventually came to respect all language conventions; there are now several tools that work with the mediawiki format

2b. such ruthlessly specific naming conventions as the ECG advocates make the semantics easier to dig out of the language - they at least augment, maybe replace, a declared semantics in link types

2c. most of the important name precedents have been already set, e.g. some verb names in REST (though REST gets some things very wrong, e.g. a DELETE is just a REDIRECT to a 404 - who needs it)

3. the real web 3.0 will be, like all serious innovations, a more layered set of services where higher-level functions aren't imposed, and where each protocol does one thing;

3a. sociosemantic webs are already growing everywhere, but in bad formats - a massive social network fallout will help here

3b. democratic domains are now emerging, from examples like Wikipedia ArbCom

3c. open configuration is the next infrastructure revolution; the main value of a monoculture is it becomes easy to switch your provider; go to editthis.info sometime and see how fast you can set up a working mediawiki - all they need is tools to make it easy to fork to another provider, and you've got your web 2.0 all there, just add semantic mediawiki or equivalent jamwiki code.

4. the real barrier to progress here is idiot VCs who will be funding bad incompatible "better wiki" formats and creating data jails, and idiot corporate users who will buy based on entirely bogus "security features"; strangely the US govt itself is not fooled and does understand what is required, have a look at Intellipedia (as much as you can)

I'm less sure of the underlying formalisms that will happen to win the lemming race, but I suggest that microformats (a terrible name for a pattern) is the closest thing to my "linkas:", and "folksonomy" or (far better) "weak ontology" asks enough of the right questions to make at least the important data searchable and comparable. Strong ontologies come from politics, biology, ecology and no other application areas - these provide us our "givens" and they're what government is constrained by and always will be: bodies, ecosystems, and frames.

So what's the next step to web 2.x? Quoting Intellipedia's shovel: "...it's wiki, wiki, baby!" Wikipedia itself is already web 1.5... and it'd be web 1.8 if the trolls took it over, and kicked out Wales, who only wants to control it so he can hawk his fourth-rate search engine.

Web Directory [08.09.07 06:17 PM]

Thanks for nice article.