Google Announces Support for Microformats and RDFa

Don’t miss James Turner’s Interview with Google Engineering’s Othar Hansson and RV Guha

On Tuesday, Google introduced a feature called Rich Snippets which provides users with a convenient summary of a search result at a glance. They have been experimenting with microformats and RDFa, and are officially introducing the feature and allowing more sites to participate. While the Google announcement makes it clear that this technology is being phased in over time making no guarantee that your site’s RDFa or microformats will be parsed, Google has given us a glimpse of the future of indexing. Read this article to find out about the underlying technology and how you can prepare you own content to work with this emerging technology.

What is RDFa?

While Google’s announcement today focuses on microformats they will soon release support for RDFa. From the W3C RDFa in XHTML Specification:

The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

Let’s take a quick look at a review from Amazon, and see how it would be marked up with RDFa to provide more information for Rich Snippets. First, here’s a review from the Amazon site:

anazom-review.png

Next, let’s take a look at a (very simplified) example of markup that might be used to generate this review:

<div>
  <div>
    79 of 98 people found the following review helpful:
  </div>
  <div>
    <span>5.0 out of 5 stars</span>
    <span><b>American Biographer: Jon Meacham</b>/span>
  </div>
  <div><a href="http://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/">
            <span>Marian the Librarian</span></a> (NY, NY)  -
  </div>
  <div>
    <b>This review is from:
      <a href="http://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/">
      American Lion: Andrew Jackson in the White House (Hardcover)</a></b>
  </div>
  <div class="review">
    American Lion is a wonderfully crafted biography about an incredibly interesting
    and oft-overlooked American who helped shaped this country...
  </div>
</div>

Next, let’s add the RDFa markup to this review that would allow Google to integrate this review into Google’s Rich Snippets. To markup this XHTML with RDFa, you use the http://data-vocabulary.org namespace and a set of attributes. To see a list of attributes that work with Google’s indexing technology, see this RDF for data-vocabulary.org:

<div xmlns:v="http://rdf.data-vocabulary.org " typeof="v:review">
  <div>
    79 of 98 people found the following review helpful:
  </div>
  <div>
    <span><span property="v:rating">5.0 out of 5 stars</span>
    <span><b>American Biographer: Jon Meacham</b>/span>
  </div>
  <div><a href="http://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/">
            <span property="v:reviewer"
                 about="http://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/">Marian the Librarian</span></a> (NY, NY)  -
	   <span property="v:dtreviewed">1st April 2009</span>
  </div>
  <div>
    <b>This review is from:
      <a property="v:itemreviewed"
           about="http://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/"
           href="http://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/">
      American Lion: Andrew Jackson in the White House (Hardcover)</a></b>
  </div>
  <div class="review" property="v:description">
    American Lion is a wonderfully crafted biography about an incredibly interesting
    and oft-overlooked American who helped shaped this country...
  </div>
</div>

This initial release covers people and reviews, but Google will be slowly rolling out support for other RDFa vocabularies and microformats as they become available. For more information, see “Marking up content with RDFa”

on the Google Webmaster/Site Owners Help site.

Analysis

While the Semantic Web has been around for years, it has yet to live up to the audacious promises that heralded its introduction to the world. What is the Semantic Web? Here’s the definition from Wikipedia in case you need a refresher:

Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “monkey”, reserving a library book, and searching for a low price for a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing, and combining information on the web.

In short, the Semantic Web is about more “meaningful” content. We’ve perfected the art of scanning text and creating massive distributed indexes that produce highly relevant search results, but when you type in “Swine Flu” you are really still dealing with an inefficient indexing approach that doesn’t know about the meaning of the text being parsed and indexed. Moving toward the Semantic Web will allow our searching technologies to become more intelligent and will set the stage for the next revolution in which computing systems can become more aware of the “meaningfulness of data”.

We’ve already seen a shift toward “semantic search”: Google has already been augmenting search results with Google Maps, limited catalog searches, and more recent entries into the search market such as Amazon’s A9 and the yet to be released Wolfram Alpha differentiate themselves by the structured data and content that can be extracted from a search result. We have yet to a see a compelling reason for web masters to place RDFa or microformats into a site to enable this semantic data to be mined until today, until Google provided a social incentive for site designers. This shift toward semantic markup promises to disrupt existing SEO approaches which are built atop the platform Google provides.

With Google in the game, it now becomes an imperative, sites that want to be listed in search results with Rich Snippets will need to think about RDFa and microformats. Tools that have been designed to present person and review data will now output RDFa and microformat markup compatible with Google by default. Blogging systems like Moveable Type or WordPress, ecommerce tools like Magento, content management tools like Alfresco and Drupal will, very quickly, adopt the formats supported by Google, and in five years time, we won’t be able to imagine a web that wasn’t being supported by semantic markup. We think reminisce about the days when search results were produced by ad-hoc text processing technologies unsupported by meaningful data. The search result you are used to today will seem quaint in comparison to the rich data-centric experience of the emerging Semantic Web.

“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. ” – Tim Berners-Lee

UPDATE (3:52PM): We’ve had some response about failing to mention Yahoo’s SearchMonkey which also supports RDFa and Microformats. Google is certainly not the first search engine to support RDFa and Microformats, but it certainly has the most influence on the search market. With 72% of the search market, Google has the influence to make people pay attention to RDFa and Microformats.