Facebook Open Graph: A new take on semantic web

A few weeks ago, Facebook announced an Open Graph initiative —
a move considered to be a turning point not just for the social networking giant,
but for the web at large. The company’s new vision is no longer to just connect people.
Facebook now wants to connect people around and across the web through concepts
they are interested in.

This vision of the web isn’t really new. Its origins go back the the person who invented the web,
Sir Tim Berners-Lee. This vision has been passionately shared and debated by the tech community
over the last decade. What Facebook has announced as Open Graph has been envisioned by many
as semantic web.

The web of people and things

At the heart of this vision is the idea that different web pages contain the same objects. Whether someone is reading about a book on Barnes and Noble, on O’Reilly or on a book review blog doesn’t matter. What matters is that the reader is interested in this particular book. And so it makes sense to connect her to friends and other readers who are interested in the same book — regardless of when and where they encountered it.

The same is true about many everyday entities that we find on the web — movies, albums, stars, restaurants, wine, musicians, events, articles, politicians, etc — the same entity is referenced in many different pages. Our brains draw the connections instantly and effortlessly, but computers can’t deduce that an “Avatar” review on Cinematical.com is talking about the movie also described on a page
on IMDB.com.

The reason it is important for things to be linked is so that people can be connected around their
interests and not around websites they visit. It does not matter to me where my friends are reading about
“Avatar”, what matters is which of my friends liked the movie and what they had to say. Without interlinking
objects across different sites, the global taste graph is too sparse and uninteresting. By re-imagining the
web as the graph of things we are interested in, a new dimension, a new set of connections gets unlocked — everything and everyone connects in a whole new way.

A brief history of semantic markups

The problem of building the web of people and things boils down to describing what is on the page and linking it to other pages. In Tim Berners-Lee’s original vision, the entities and relationships between them would be described using RDF. This mathematical language
was designed to capture the essence of objects and relationships in a precise way. While it’s true that RDF annotation would be the most complete, it also turns out to be quite complicated.

It is this complexity that the community has attempted to address over the years. A simpler approach
called Microformats was developed by Tantek Celik, Chris Messina and others.
Unlike RDF, Microformats rely on existing XHTML standards and leverage CSS classes to markup the content.
Critically, Microformats don’t add any additional information to the page, but just annotate the data that is already on the page.

Microformats enjoyed support and wider adoption because of their relative simplicity and focus on marking up the existing content. But there are still issues. First, the number of supported entities is limited, the focus has been on marking organizations, people and events, and then reviews, but there is no way to markup, for example, a movie or a book or a song. Second, Microformats are somewhat cryptic and hard to read. There is cleverness involved in figuring out how to do the markup, which isn’t necessarily a good thing.

In 2005, inspired by Microformats, Ian Davis, now CTO of Talis, developed eRDF — a syntax
within HTML for expressing a simplified version of RDF. His approach married the canonical concepts of RDF and the idea
from Microformats that the data is already on the page. An iteration of Ian’s work,
called RDFa, has been adopted as a W3C standard. All the signs
point in the direction of RDFa being the solution of choice for describing entities inside HTML pages.

Until recently, despite the progress in the markups, adoption was hindered by the fact that publishers lacked the incentive to annotate the pages. What is the point if there are no applications that can take advantage of it?
Luckily, in 2009 both Yahoo and Google put their muscle behind marking up pages.

First Yahoo developed an elegant search application called Search Monkey.
This app encouraged and enabled sites to take control over how Yahoo’s search engine presented the results. The solution
was based on both markup on the page and a developer plugin, which gave the publishers control over presenting the results to the user.
Later, Google announced rich snippets. This supported both Microformats and RDFa markup and enabled webmasters to control how their search results are presented.

Still missing from all this work was a simple common vocabulary for describing everyday things. In 2008-2009, with help from Peter Mika from Yahoo research, I developed a markup called abmeta.
This extensible, RDFa-based markup provided a vocabulary for describing everyday entities like movies, albums, books,
restaurants, wines, etc. Designed with simplicity in mind, abmeta supports declaring single and multiple entities on
the page, using both meta headers and also using RDFa markup inside the page.

Facebook Open Graph protocol

The markup announced by Facebook can be thought of as a subset of abmeta because it supports the declaration
of entities using meta tags. The great thing about this format is simplicity. It is literally readable in English.

The markup defines several essential attributes — type, title, URL, image and description. The protocol comes with a reasonably rich taxonomy of types, supporting entertainment, news, location, articles
and general web pages. Facebook hopes that publishers will use the protocol to describe the entities on pages.
When users press the LIKE button, Facebook will get not just a link, but a specific object of the specific type.

If all of this computes correctly, Facebook should be able to display a rich collection of entities on user profiles,
and, should be able to show you friends who liked the same thing around the web, regardless of the site. So by
publishing this protocol and asking websites to embrace it, Facebook clearly declares its foray
into the web of people and things — aka, the semantic web.

Technical issues with Facebook’s protocol

As I’ve previously pointed out on my post on ReadWriteWeb,
there are several issues with the markup that Facebook proposed.

1. There is no way to disambiguate things. This is quite a miss on Facebook’s part, which is already
resulting in bogus data on user profiles. The ambiguity is because the protocol is lacking secondary
attributes for some data types. For example, it is not possible to distinguish the movie from its remake. Typically,
such disambiguation would be done by using either a director or a year property, but Facebook’s protocol does
not define these attributes. This leads to duplicates and dirty data.

2. There is no way to define multiple objects on the page. This is another rather surprising limitation,
since previous markups, like Microformats and abmeta, support this use case. Of course if Facebook only cares about
getting people to LIKE pages so that they can do better ad targeting, then having multiple objects inside the
page is not necessary. But Facebook claimed and marketed this offering as semantic web, so it is surprising that there
is no way to declare multiple entities on a single page. Surely a comprehensive solution ought to do that.

3. Open protocol can’t be closed. Finally, Facebook has done this without collaborating with anyone.
For something to be rightfully called an Open Graph Protocol, it should be developed in an open collaboration with
the web. Surely, Google, Yahoo!, W3C and even small startups playing in the semantic web space would have good things
to contribute here.

It sadly appears that getting the semantic web elements correct was not the highest priority for Facebook. Instead, the announcement seems to be a competitive move against Twitter, Google and others with the goal to lock-in publishers by giving them a simple way to recycle traffic.

Where to next?

Despite the drawbacks, there is no doubt that Facebook’s announcement is a net positive for the web at large.
When one of the top companies takes a 180-degree turn and embraces a vision that’s been discussed
for a decade, everyone stops and listens. The web of people and things is now both very
important and a step closer. The questions are: What is the right way? And how do we get there?

For starters, it would be good to fill in some holes in Facebook Open Graph. Whether it is the right way overall or not, at least
we need to make it complete. It is important to add support for secondary attributes necessary for disambiguation and
also, important to add support for multiple entities inside the page (even if there is only one LIKE button on the
whole page). Both of these are already addressed by Microformats and abmeta, so it should be easy to fix.

Beyond technical issues, Facebook should open up this protocol and make it owned by the community, instead of being
driven by one company’s business agenda. A true roundtable with major web companies, publishers, and small startups would
result in a correct, comprehensive and open protocol. We want to believe that Facebook will do the right thing
and will collaborate with the rest the web on what has been an important work spanning years for many of us.
The prospects are exciting, because we just made a giant leap. We just need to make sure we land in the right place.