• Print

Google Announces Support for Microformats and RDFa

Don’t miss James Turner’s Interview with Google Engineering’s Othar Hansson and RV Guha

On Tuesday, Google introduced a feature called Rich Snippets which provides users with a convenient summary of a search result at a glance. They have been experimenting with microformats and RDFa, and are officially introducing the feature and allowing more sites to participate. While the Google announcement makes it clear that this technology is being phased in over time making no guarantee that your site’s RDFa or microformats will be parsed, Google has given us a glimpse of the future of indexing. Read this article to find out about the underlying technology and how you can prepare you own content to work with this emerging technology.

What is RDFa?

While Google’s announcement today focuses on microformats they will soon release support for RDFa. From the W3C RDFa in XHTML Specification:

The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

Let’s take a quick look at a review from Amazon, and see how it would be marked up with RDFa to provide more information for Rich Snippets. First, here’s a review from the Amazon site:

anazom-review.png

Next, let’s take a look at a (very simplified) example of markup that might be used to generate this review:

<div>
  <div>
    79 of 98 people found the following review helpful:
  </div>
  <div>
    <span>5.0 out of 5 stars</span>
    <span><b>American Biographer: Jon Meacham</b>/span>
  </div>
  <div><a href="http://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/">
            <span>Marian the Librarian</span></a> (NY, NY)  -
  </div>
  <div>
    <b>This review is from:
      <a href="http://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/">
      American Lion: Andrew Jackson in the White House (Hardcover)</a></b>
  </div>
  <div class="review">
    American Lion is a wonderfully crafted biography about an incredibly interesting
    and oft-overlooked American who helped shaped this country...
  </div>
</div>

Next, let’s add the RDFa markup to this review that would allow Google to integrate this review into Google’s Rich Snippets. To markup this XHTML with RDFa, you use the http://data-vocabulary.org namespace and a set of attributes. To see a list of attributes that work with Google’s indexing technology, see this RDF for data-vocabulary.org:

<div xmlns:v="http://rdf.data-vocabulary.org " typeof="v:review">
  <div>
    79 of 98 people found the following review helpful:
  </div>
  <div>
    <span><span property="v:rating">5.0 out of 5 stars</span>
    <span><b>American Biographer: Jon Meacham</b>/span>
  </div>
  <div><a href="http://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/">
            <span property="v:reviewer"
                 about="http://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/">Marian the Librarian</span></a> (NY, NY)  -
	   <span property="v:dtreviewed">1st April 2009</span>
  </div>
  <div>
    <b>This review is from:
      <a property="v:itemreviewed"
           about="http://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/"
           href="http://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/">
      American Lion: Andrew Jackson in the White House (Hardcover)</a></b>
  </div>
  <div class="review" property="v:description">
    American Lion is a wonderfully crafted biography about an incredibly interesting
    and oft-overlooked American who helped shaped this country...
  </div>
</div>

This initial release covers people and reviews, but Google will be slowly rolling out support for other RDFa vocabularies and microformats as they become available. For more information, see “Marking up content with RDFa”

on the Google Webmaster/Site Owners Help site.

Analysis

While the Semantic Web has been around for years, it has yet to live up to the audacious promises that heralded its introduction to the world. What is the Semantic Web? Here’s the definition from Wikipedia in case you need a refresher:

Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “monkey”, reserving a library book, and searching for a low price for a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing, and combining information on the web.

In short, the Semantic Web is about more “meaningful” content. We’ve perfected the art of scanning text and creating massive distributed indexes that produce highly relevant search results, but when you type in “Swine Flu” you are really still dealing with an inefficient indexing approach that doesn’t know about the meaning of the text being parsed and indexed. Moving toward the Semantic Web will allow our searching technologies to become more intelligent and will set the stage for the next revolution in which computing systems can become more aware of the “meaningfulness of data”.

We’ve already seen a shift toward “semantic search”: Google has already been augmenting search results with Google Maps, limited catalog searches, and more recent entries into the search market such as Amazon’s A9 and the yet to be released Wolfram Alpha differentiate themselves by the structured data and content that can be extracted from a search result. We have yet to a see a compelling reason for web masters to place RDFa or microformats into a site to enable this semantic data to be mined until today, until Google provided a social incentive for site designers. This shift toward semantic markup promises to disrupt existing SEO approaches which are built atop the platform Google provides.

With Google in the game, it now becomes an imperative, sites that want to be listed in search results with Rich Snippets will need to think about RDFa and microformats. Tools that have been designed to present person and review data will now output RDFa and microformat markup compatible with Google by default. Blogging systems like Moveable Type or WordPress, ecommerce tools like Magento, content management tools like Alfresco and Drupal will, very quickly, adopt the formats supported by Google, and in five years time, we won’t be able to imagine a web that wasn’t being supported by semantic markup. We think reminisce about the days when search results were produced by ad-hoc text processing technologies unsupported by meaningful data. The search result you are used to today will seem quaint in comparison to the rich data-centric experience of the emerging Semantic Web.

“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. ” – Tim Berners-Lee

UPDATE (3:52PM): We’ve had some response about failing to mention Yahoo’s SearchMonkey which also supports RDFa and Microformats. Google is certainly not the first search engine to support RDFa and Microformats, but it certainly has the most influence on the search market. With 72% of the search market, Google has the influence to make people pay attention to RDFa and Microformats.

  • http://basiscraft.com Thomas Lord

    I think it would improve the article if you mentioned the contributions of the Creative Commons organization to the invention and standardization of RDFa and it’s relation to ccREL.

    -t

  • http://friendfeed.com/mattcutts Matt Cutts

    Cool, I didn’t know they did an interview.

  • http://www.ships-info.info Ships

    Google are really fast and innovatively developing. They are making new investments in things, which I suppose that are bringing millions to them

  • http://johngoodwin225.wordpress.com John Goodwin

    How does this compare to SearchMonkey from Yahoo! ?

  • Brendan Gibson

    Where can we see this in action?

  • http://keithalexander.co.uk/ Keith Alexander

    In the example, the URIs used identify web pages on Amazon, not people or books.

  • http://buytaert.net Dries Buytaert
  • seutje

    In my experience, if Google hops on board with something, the rest usually aren’t far behind (yea I know yahoo has searchmonkey, not the point), so I think this a big step forward, not just for google, microformats and RDFa, but for the whole web

    of course it still won’t be an instant switch, as there are still enough sites out there that break when viewed with anything other than IE6, but this should definitely speed things up

  • http://www.ninebyblue.com Vanessa Fox

    I was at Google’s event today when they made the announcement and have been writing about SearchMonkey, etc. for a while. My write up about the details is here:
    http://searchengineland.com/google-search-now-supports-microformats-and-adds-rich-snippets-to-search-results-19055

  • http://www.keywordintent.com Jacqui Jones – KeywordIntent

    What will happen to organizations where data is their core asset such as Yellow Pages and other directories?

    Already Yellow Pages around the world have been losing market share, but still manage to stay relevant because they have “structured data”.

    Surely, if these types of businesses apply microformats or RDFa to their data, their main advantage disappears?

    And, if they don’t place RDFa or microformats around their data, Google essentially grabs the same content from original and alternative sources (which they do anyway).

    The pressure is on for these businesses to reinvent themselves pronto.

  • http://memesteading.wordpress.com Gordon Mohr

    The markup example is nice — but what does the resulting ‘rich snippet’ then look like?

  • http://www.ninebyblue.com Vanessa Fox

    Gordon, you can see an example of a review in the article I linked.

  • http://berjon.com/ Robin Berjon

    It’s an interesting step forward, it’s just a shame that they didn’t pick FOAF to markup people.

  • http://corewar.co.uk John

    Great, another new web technology to learn! Lets hope it provides more value to the web users than SEO spammers.

  • http://blog.zmok.net Roman

    Wonder how Microsoft will react.

  • James

    Of course the example is Amazon. Google ranks Amazon for everything. Can’t wait to see what kind of spam develops from this.

  • http://mysitesdoneright.com BillyG

    With all these tools I’m using, and this got passed me somehow?

    Thanks for the heads up Timothy!

  • http://www.supaswag.co.uk supaswag

    Me, as a computer, I can’t wait for the new semantic, irony free web. Beep!

  • dbrimlow

    The examples of supposed “semantic markup” used are extremely poor and counter-productive to those of us who are trying to promote the use of actual “semantic markup”. The example used just shows how RDFa is used within some poorly crafted html markup, but the markup itself is NOT semantic at all and falls under the “div-itis” category.

    People making the transition away from tables usually get div-itis, meaning that they use elements for everything instead of . This is just about as bad as using tables for every context, since div elements have no meaning but acting as containers for parts of a web page.

    For web text to be truly semantic, the mark up content should have no relation to how it is to be presented within the design; it requires a proper text hierarchy “meaning” based upon headings, paragraphs and lists – with the absolute limitation and minimalism of markup level presentation … like those tags!

    Divs are only html placeholders used for presentation (layout, positioning, boxes, etc) and intented to be populated with proper logical block level html tags. Spans are only presentation alterations (to change the LOOK not the MEANING) of a select span of data within a pre-existing block of text.

    I can only assume that (based upon proper “semantic markup”) the example should have looked something like this:

    79 of 98 people found the following review helpful:

    5.0 out of 5 stars
    American Biographer: Jon Meacham


    • about=”http://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/”>Marian the Librarian (NY, NY) -
    • 1st April 2009

    This review is from:


    American Lion: Andrew Jackson in the White House (Hardcover)

    American Lion is a wonderfully crafted biography about an incredibly interesting
    and oft-overlooked American who helped shaped this country…

    I would like to know for certain if the RDFa in my example is correct. If not, then once again there is a serious confusion with the naming of technologies …

    “Semantic markup” and “Semantic web” would be two entirely different things.

  • http://www.guava.co.uk Teddie

    This is a great step forward, I have been a proponent for a long time of the fact that in order for search engines to become more useful, they need better information. Although I do think it will take quite some time for the average website to adopt these standards, it reminds me of that Peter Norvig to Tim Berners Lee comment….”We deal with millions of Web masters who can’t configure a server, can’t write HTML. It’s hard for them to go to the next step.”

  • http://www.ianjindal.com Ian Jindal

    Is this Amazon’s interpretation of the hreview format?

    http://microformats.org/wiki/hreview

  • http://brianodonovan.net Brian O'Donovan

    The fact that Google will start indexing RDF is definitely a significant factor encouraging people to add RDF to their site.

    However, there is an unfortunate history of site owners adding irrelevant keywords to their site in an attempt to “game” the google indexing algorithm to give their site a higher ranking than it deserves.

    Having accurate RDF would be very useful. However, having junk metadata in RDF format is no better than having junk metadata in any other format.

  • ssc.mike.pearson

    The NZ Government is consulting on a feed standard to make content more meaningful:
    http://research.elabs.govt.nz/new-zealand-government-feed-standard-2009/

  • http://www.support1000.com Comuputer Repair

    Thanks sharing this information with us.

  • http://www.online-languagetranslators.com John Anderson

    We are still waiting for RDFa tool developed for earlier versions Dreamveaweaver to insert into our web pages RDFa faster. I was surprised that many webmasters did not still upgraded their websites with this tool. I also don’t see websites with star ratings. I understand that rating by clients/customers may not work well due to possibility of competitors’ negative/false ratings in some industries. Of course, Amazon’s competitors would not do that. But in highly competitive fields that may happen.
    Anyway, we will go ahead with RDFa as a step forward in semantic web.
    For those webmasters who use Dreamweaver, Martin McEvoy announced new version of RDFa extension for Dreamweaver 8 and CS4 at http://weborganics.co.uk/files/RDFa-Documents.mxp.

  • http://www.remotesupport.com PC Tech

    Ok, than Microsoft will make some different format and another war will start.

  • http://ibcscorp.com james

    Thanks for the post. I guess like Google has done, as a development community we will just have to support both microformats and RDFa

  • Frank Paolino

    I run a review website, http://www.notesappstore.com/ and we are excited about the addition of microformats for our reviews.

    I mentioned our implementation on my blog:
    http://blog.maysoft.org/blog.nsf/d6plinks/FPAO-88NM7N

    I really like this “data” approach to the web. It will make surfing more helpful.

  • http://www.sietek.hu haver

    We are still waiting for RDF.

  • http://ceg.sonet.hu jonah

    can we see the product of the markup?