Facebook Open Graph: A new take on semantic web

Facebook's Open Graph is both an important step and one that still needs work.

Facebook logoA few weeks ago, Facebook announced an Open Graph initiative
a move considered to be a turning point not just for the social networking giant,
but for the web at large. The company’s new vision is no longer to just connect people.
Facebook now wants to connect people around and across the web through concepts
they are interested in.

This vision of the web isn’t really new. Its origins go back the the person who invented the web,
Sir Tim Berners-Lee. This vision has been passionately shared and debated by the tech community
over the last decade. What Facebook has announced as Open Graph has been envisioned by many
as semantic web.

The web of people and things

At the heart of this vision is the idea that different web pages contain the same objects. Whether someone is reading about a book on Barnes and Noble, on O’Reilly or on a book review blog doesn’t matter. What matters is that the reader is interested in this particular book. And so it makes sense to connect her to friends and other readers who are interested in the same book — regardless of when and where they encountered it.

The same is true about many everyday entities that we find on the web — movies, albums, stars, restaurants, wine, musicians, events, articles, politicians, etc — the same entity is referenced in many different pages. Our brains draw the connections instantly and effortlessly, but computers can’t deduce that an “Avatar” review on Cinematical.com is talking about the movie also described on a page
on IMDB.com.

The reason it is important for things to be linked is so that people can be connected around their
interests and not around websites they visit. It does not matter to me where my friends are reading about
“Avatar”, what matters is which of my friends liked the movie and what they had to say. Without interlinking
objects across different sites, the global taste graph is too sparse and uninteresting. By re-imagining the
web as the graph of things we are interested in, a new dimension, a new set of connections gets unlocked — everything and everyone connects in a whole new way.

A brief history of semantic markups

The problem of building the web of people and things boils down to describing what is on the page and linking it to other pages. In Tim Berners-Lee’s original vision, the entities and relationships between them would be described using RDF. This mathematical language
was designed to capture the essence of objects and relationships in a precise way. While it’s true that RDF annotation would be the most complete, it also turns out to be quite complicated.

It is this complexity that the community has attempted to address over the years. A simpler approach
called Microformats was developed by Tantek Celik, Chris Messina and others.
Unlike RDF, Microformats rely on existing XHTML standards and leverage CSS classes to markup the content.
Critically, Microformats don’t add any additional information to the page, but just annotate the data that is already on the page.

Microformats enjoyed support and wider adoption because of their relative simplicity and focus on marking up the existing content. But there are still issues. First, the number of supported entities is limited, the focus has been on marking organizations, people and events, and then reviews, but there is no way to markup, for example, a movie or a book or a song. Second, Microformats are somewhat cryptic and hard to read. There is cleverness involved in figuring out how to do the markup, which isn’t necessarily a good thing.

In 2005, inspired by Microformats, Ian Davis, now CTO of Talis, developed eRDF — a syntax
within HTML for expressing a simplified version of RDF. His approach married the canonical concepts of RDF and the idea
from Microformats that the data is already on the page. An iteration of Ian’s work,
called RDFa, has been adopted as a W3C standard. All the signs
point in the direction of RDFa being the solution of choice for describing entities inside HTML pages.

Until recently, despite the progress in the markups, adoption was hindered by the fact that publishers lacked the incentive to annotate the pages. What is the point if there are no applications that can take advantage of it?
Luckily, in 2009 both Yahoo and Google put their muscle behind marking up pages.

First Yahoo developed an elegant search application called Search Monkey.
This app encouraged and enabled sites to take control over how Yahoo’s search engine presented the results. The solution
was based on both markup on the page and a developer plugin, which gave the publishers control over presenting the results to the user.
Later, Google announced rich snippets. This supported both Microformats and RDFa markup and enabled webmasters to control how their search results are presented.

Still missing from all this work was a simple common vocabulary for describing everyday things. In 2008-2009, with help from Peter Mika from Yahoo research, I developed a markup called abmeta.
This extensible, RDFa-based markup provided a vocabulary for describing everyday entities like movies, albums, books,
restaurants, wines, etc. Designed with simplicity in mind, abmeta supports declaring single and multiple entities on
the page, using both meta headers and also using RDFa markup inside the page.

Facebook Open Graph protocol

The markup announced by Facebook can be thought of as a subset of abmeta because it supports the declaration
of entities using meta tags. The great thing about this format is simplicity. It is literally readable in English.

The markup defines several essential attributes — type, title, URL, image and description. The protocol comes with a reasonably rich taxonomy of types, supporting entertainment, news, location, articles
and general web pages. Facebook hopes that publishers will use the protocol to describe the entities on pages.
When users press the LIKE button, Facebook will get not just a link, but a specific object of the specific type.

If all of this computes correctly, Facebook should be able to display a rich collection of entities on user profiles,
and, should be able to show you friends who liked the same thing around the web, regardless of the site. So by
publishing this protocol and asking websites to embrace it, Facebook clearly declares its foray
into the web of people and things — aka, the semantic web.

Technical issues with Facebook’s protocol

As I’ve previously pointed out on my post on ReadWriteWeb,
there are several issues with the markup that Facebook proposed.

1. There is no way to disambiguate things. This is quite a miss on Facebook’s part, which is already
resulting in bogus data on user profiles. The ambiguity is because the protocol is lacking secondary
attributes for some data types. For example, it is not possible to distinguish the movie from its remake. Typically,
such disambiguation would be done by using either a director or a year property, but Facebook’s protocol does
not define these attributes. This leads to duplicates and dirty data.

2. There is no way to define multiple objects on the page. This is another rather surprising limitation,
since previous markups, like Microformats and abmeta, support this use case. Of course if Facebook only cares about
getting people to LIKE pages so that they can do better ad targeting, then having multiple objects inside the
page is not necessary. But Facebook claimed and marketed this offering as semantic web, so it is surprising that there
is no way to declare multiple entities on a single page. Surely a comprehensive solution ought to do that.

3. Open protocol can’t be closed. Finally, Facebook has done this without collaborating with anyone.
For something to be rightfully called an Open Graph Protocol, it should be developed in an open collaboration with
the web. Surely, Google, Yahoo!, W3C and even small startups playing in the semantic web space would have good things
to contribute here.

It sadly appears that getting the semantic web elements correct was not the highest priority for Facebook. Instead, the announcement seems to be a competitive move against Twitter, Google and others with the goal to lock-in publishers by giving them a simple way to recycle traffic.

Where to next?

Despite the drawbacks, there is no doubt that Facebook’s announcement is a net positive for the web at large.
When one of the top companies takes a 180-degree turn and embraces a vision that’s been discussed
for a decade, everyone stops and listens. The web of people and things is now both very
important and a step closer. The questions are: What is the right way? And how do we get there?

For starters, it would be good to fill in some holes in Facebook Open Graph. Whether it is the right way overall or not, at least
we need to make it complete. It is important to add support for secondary attributes necessary for disambiguation and
also, important to add support for multiple entities inside the page (even if there is only one LIKE button on the
whole page). Both of these are already addressed by Microformats and abmeta, so it should be easy to fix.

Beyond technical issues, Facebook should open up this protocol and make it owned by the community, instead of being
driven by one company’s business agenda. A true roundtable with major web companies, publishers, and small startups would
result in a correct, comprehensive and open protocol. We want to believe that Facebook will do the right thing
and will collaborate with the rest the web on what has been an important work spanning years for many of us.
The prospects are exciting, because we just made a giant leap. We just need to make sure we land in the right place.

tags: ,
  • bruce wayne

    ….Great write up !!!!!!…but I think you should disclose that you are the CEO of a company that Links data and that is a competitor to Facebook in this area…..

    ———————
    I m one of the founders of Factoetum.com and we are a very different linked data company…..We want to put the end user back in control of the content that they own and we want to change the paradigm to one where consumers are no longer programmed…to a world where we are all programmers…

    One of the key things for me is that so many have put so much stock into the RDF methodology being the right way to go at it…and a derivatives of RFD are still by in large in agreement with RDF methodologies….As it is non of these work very well and very few non technical (Most of the world) could easily add any of the needed mark up to their content….or create an application that can ingest and reuse the information….I think that we need to should take another look at how to name and link data….one that is not dependent on the previous theory of RDF…..

    One of the major sticking points in all of this is that for me….NO COMPANY should own consumer “Linked Data” this data should be owned by the consumer and consumer communities..It is interesting to me read about “Companies” pushing for or creating “open” standards that would clearly give them leverage over competitors, developers and consumers that own the content…..

    http://www.factoetum.com/factoetum/Rolling_Stone_500_Greatest_Albums_of_All_Time

  • Phil Simon

    The semantic web is becoming a reality, although we have major hurdles. I did a podcast with David Siegel, author of Pull.

    http://www.philsimonsystems.com/content/tech-today/technology-today-20-semantic-web/

    The web will get smarter; it’s just going to take some time.

  • Manu Sporny

    Hi Alex,

    Good article overall. I noticed one paragraph that has a number of factual errors:


    In 2005, inspired by Microformats, Ian Davis, now CTO of Talis, developed eRDF — a syntax within HTML for expressing a simplified version of RDF. His approach married the canonical concepts of RDF and the idea from Microformats that the data is already on the page. An iteration of Ian’s work, called RDFa, has been adopted as a W3C standard. All the signs point in the direction of RDFa being the solution of choice for describing entities inside HTML pages.

    RDFa is not an iteration of Ian’s work on eRDF – RDFa was started in early 2004 by Mark Birbeck… before eRDF and around the same time as Microformats. eRDF, DC-HTML, Microformats, Microdata and RDFa all share some basic goals and have cross-pollinated heavily, but to say that RDFa is solely an iteration of the work by Ian (as great of a guy as he is) isn’t fair to those that started and continue to do the work on RDFa. It is also not fair to the RDFa community whose dedication, reviews, feedback and implementations have made things like OGP, Drupal, Best Buy, Digg and the UK Government happen.

    The RDFa WG participants can be found here:

    http://www.w3.org/TR/2010/WD-rdfa-core-20100422/#a_acks

    So, some corrections:

    RDFa isn’t an iteration of Microformats, it started around the same time.

    eRDF started after RDFa.

    You should at least credit Mark Birbeck with starting RDFa and give some credit to the RDFa WG and Community – they are the ones that continue to drive innovation in this area.

    For more history on RDFa, check this link out:

    http://dev.w3.org/html5/rdfa/#history

    Thanks in advance for the corrections :)

  • Tantek

    Alex,

    +1 on Manu’s corrections – I can confirm. And a few more:

    “CSS classes” – a common misconception, they’re actually “HTML classes”, using HTML classes for styling via the CSS class selector just happens to be the most popular use.

    “still issues. First, the number of supported entities is limited,” – funny that you assert that as an issue – do you have any data to back that up?

    In contrast, I present the “limited entities” of HTML which continues to succeed on the Web (and expand, with HTML5), while the “unlimited entities” XML based web failed to take off. I think there are even some O’Reilly articles on the failure of the XML-based web. Even for APIs, JSON is replacing/displacing XML.

    There is much more evidence today that on the Web, in particular for marking up content, limited entities are what succeed and scale across the vast number of humans touching the web, rather than any form of unlimited entities. How many times do we need to repeat this Tower of Babel experiment? Or is this simply an unquestioned faith/dogma in “extensibility” that will never die?

    “there is no way to markup, for example, a movie…” – perhaps you missed:

    hMedia – http://microformats.org/wiki/hmedia

    “… or a book” – or the citation microformat in progress:

    http://microformats.org/wiki/citation – though even a plain HTML element is a good start.

    “… or a song.” – you overlooked:

    hAudio – http://microformats.org/wiki/hAudio – which Manu above happens to be a co-editor of.

    “Second, Microformats are somewhat cryptic and hard to read.” – to whom?

    To modern web designers who are used to using HTML classes to add semantics to their markup, microformats are the most natural way to do so, rather than new attributes, or random XML formats in special side-file locations.

    Regarding your critique/comments of Facebook’s OGP – I’m still withholding most of my opinions on that for now – however I will add just one thing.

    Didn’t we learn in the late 1990s and early 2000s that tags make a horrible transport for any kind of data over the long run? Invisible metadata rots, gets spammed, out of sync with the content, etc. Those who forget (or perhaps never experienced?) this history seem doomed to repeat it.

    Any proprietary meta-tag based format/protocol is at best “mostly harmless” (with apologies to
    http://en.wikipedia.org/wiki/Mostly_Harmless )

    P.S. Happy (belated) Towel Day.

  • Alex Iskold

    Manu / Tantek,

    Apologies for missing a few points. I was relying on my memory and Wikipedia:

    http://en.wikipedia.org/wiki/Embedded_RDF
    http://en.wikipedia.org/wiki/RDFa

    Also, my bad for not seeing newer Microformats, looks like more is possible now.

    Personally and as an engineer, I am fine with ANY markup as long as it is standardized and I am also fine with several markups, because I do not think it is hard to support several, like Yahoo! and Google do.

    It seems to me that Facebook decided that microformats would be more complex for publishers to do, and based on my experience with publishing industry I have to agree.

    Perhaps to disprove this, we should work on equivalent markup via microformats that would enable publishers to describe the same entities?

    If this happens, I think that it would be a realistic conversation for Facebook to support both & perhaps other players would jump in as well. What do you think?

  • http://www.plumberirvine.org/ Irvine

    wow , incredible post about facebook and for more informations, reading this post take me some time, but i surely know something now. thank a lot

  • http://marktsinfoblog.blogspot.com Mark Thristan

    1) This is a great little overview (comments notwithstanding)

    2) I think the answer is pretty straightforward, however, and is noted in Alex’s 27 May comment: “I am also fine with several markups”.

    It is not the how that should be important, it should be the what: who cares if it’s RDFa, microformats or OGP. Small pieces loosely joined is what counts.

    At the end of the day, the philosophy is still entities/objects with URIs which can be joined by relationships (triples, subject-object-verb…whatever).

    The effort should be on crosswalking as I am doubtful that one approach will ever be fully implemented – why worry so much? The world can live with grails, rails, asp, jsp, or HTML, XHTML, HTML5, Flash – why not with a mix of these approaches for metadata description?

    If recent history shows us anything, it is that order comes from a nice big mess, and a nice juicy carrot: once one organisation gains a major financial headstart from a decent semantic web approach, everyone will follow…

    3) I kind of agree with Tantek on metadata…

  • http://www.cygnismedia.com/ Facebook Developer

    Your work has always been a great source of inspiration for me. I refer you blog to many of my friends as well.