Uniform APIs for the data web

The Open Data Protocol is a promising approach for uniform APIs.

The elmcity service connects to a half-dozen other services, including Eventful, Upcoming, EventBrite, Facebook, Delicious, and Yahoo. It’s nice that each of these services provides an API that enables elmcity to read their data. It would be even nicer, though, if elmcity didn’t have to query, navigate, and interpret the results of each of these APIs in different ways.

For example, the elmcity service asks the same question of Eventful, Upcoming, and EventBrite: “What are the titles, dates, times, locations, and URLs of recent events within radius R of location L?” It has to ask that question three different ways, and then interpret the answers three different ways. Can we imagine a more frictionless approach?

I can. Here’s how the question might be asked in a general way using the Open Data Protocol (OData):

An OData reply is a feed of Atom entries, optionally annotated with types. Here’s a sketch of how one of those entries might look as part of a general OData answer to the question:

(With the addition of $format=json to the query URL, the same information arrives as a JSON payload.)

Of course there would still be differences among these APIs. Each of the three services in this example has its own naming conventions and its own way of modeling events and venues. It would still take some work to abstract away those differences. But you’d be using a common query mechanism, a common set of data representations, a common way of linking them together, and a common set of helper libraries for many programming environments.

A WordPress thought experiment

Blog publishing systems have long implemented APIs that enable client applications to fetch and post blog entries. For historical reasons there are a variety of these APIs. Because they’re widely adopted in the blog domain, it’s pretty likely that an application that works with one blog system’s implementation of one of the APIs will work with another blog system’s implementation of the same API. But these APIs are specific to the blog domain.

What if blogs had come of age in an era when a uniform kind of API was expected? We could then ask questions of blogs in the same way we could ask questions of event services in the hypothetical example shown above, or of any other kind of service. And we could interpret the answers in the same way too.

Suppose we want to ask a blog service: “What are the published entries since April 10, 2011?” Here’s an OData version of the question:

And here’s an answer, in JSON format, from a hypothetical WordPress OData service:

Except it’s not hypothetical! The guid shown in this example points to a real WordPress post. And the uri in the example points to a live OData service that emits the chunk of JSON we see here. If you’re so inclined, you can start at the root of the service and explore all the tables used in that WordPress blog.

How is this possible? I’m running WordPress on Azure; this instance of WordPress uses the SQL Azure database; the database is OData-enabled. In this case I’m allowing only read access. But if the database were writable a blog client could add new entries by sending HTTP POST requests with Atom payloads.

OData for MySQL

Of course WordPress more typically runs on MySQL. Can we do the same kind of thing there? Sort of. Here’s a query that fetches posts from a Linux/MySQL instance of WordPress and returns them as an Atom feed with OData annotations:


In this case the OData view of the underlying MySQL database is provided by MySQLOData, a “PHP-based MySQL OData Server library which exposes all data within a MySQL database to the world in OData ATOM or JSON format.”

There are two issues here. One is my fault. I’m not fluent in PHP and I haven’t been able to get MySQLOData working to its full capability. Do you know of a live instance of MySQLOData that is properly installed and configured? If so please show me the URL, I’d like to try it out.

The second issue is more fundamental. Suppose MySQLOData becomes a full implementation of OData. In any environment where there is PHP and MySQL, any application built on MySQL could automatically expose an API based on a common query mechanism, a common set of data representations, a common way of linking them together, and a common set of helper libraries. Great! But what if there’s no PHP in the environment? What if there’s only Python? Or only Ruby? A Django- or Rails-based service shouldn’t have to add PHP to the mix in order to provide a uniform API.

If MySQL itself could present an OData interface, then layered services written in any language could automatically provide APIs in a standard way. Here’s a description of how that might work:

If we provide access to existing databases as though they were in hypertext form, the system will get off the ground quicker … What is required is a gateway program which will map an existing structure onto the hypertext model, and allow limited (perhaps read-only) access to it.

If you know your web history that may sound familiar. It’s from Tim Berners-Lee’s 1989 proposal for the World Wide Web.

There’s more than one way to do it

Of course OData isn’t the only way services could automatically provide uniform APIs. Such things typically come in several flavors. In the blog domain there have always been a few of them: the Blogger API, the metaWeblog API, etc. I think it’s unlikely that we’ll end up with a single flavor of uniform API. But right now we don’t have any uniform flavor! Every service that provides an API has to invent its own query mechanism, data representations, and helper libraries. If you want to mash up services — as we increasingly do — the differences among these APIs create a lot of friction.

OData looks to me like one good way to overcome that friction. I’d love to see OData gateways co-located with every popular database. With such gateways in place, the web of data we’re collectively trying to build would get off the ground quicker.


tags: ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

Get the O’Reilly Web Ops and Performance Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

Get the O’Reilly Programming Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

Get the O’Reilly Hardware Newsletter

Get weekly insight and knowledge on how to design, prototype, manufacture, and market great connected devices.

Get Four Short Links in Your Inbox

Sign up to receive Nat’s eclectic collection of curated links every weekday.

Get the O’Reilly Design Newsletter

Stay informed. Receive weekly insight from industry insiders.

Get the O’Reilly Web Platform Newsletter

Stay informed. Receive weekly insight from industry insiders—plus exclusive content and offers.

  • terribly dificult to achieve any impact but for the customers of MS. See its legal terms. http://www.microsoft.com/interop/osp/default.mspx

    It doesn’t look very promising if the only promise is to not sue you.

    • @alberto: What additional promises would you like to see?

  • Jon, I certainly agree with your data-access aspirations, but I think that asking for a common query API is aiming too high. A query API reflects the server’s opinion about what kinds of computations you’d like to do over their data. To use it you have to get inside the server designer’s head. I think we’d make huge progress if we just pulled off a lesser task: ensuring one could fetch _all_ the server’s data, to be processed however you like on your end. Working from your examples, a blog that posts once per day will have a maximum of about 5000 items by now; sending a description of all of them would involve less bytes than a typical CSS file seems to require these days. We only really need APIs when the raw data is too big to deliver in its entirety. Such cases are very important but I believe they are not the majority. And as I argued in this blog post: http://groups.csail.mit.edu/haystack/blog/2010/05/31/on-a-few-deadly-data-sins-and-the-entropy-of-open-data/ there are far too many cases when you can’t get at the data at all—so it’s premature to be asking for a computational API.

    • @david You’re right. In many cases datasets are small and/or unconventionally structured. In such cases, a wholesale dump should be a baseline option. In many other cases datasets, whether small or large, are conventionally structured by some engine. In such cases, size permitting, a wholesale dump should also be a baseline option. But if popular engines can be equipped for standard kinds of structured query and navigation, why not also do that? I don’t see these as mutually exclusive scenarios.

  • *cough* SPARQL */cough*

    • @daniel: As I said, There Is More Than One Way To Do It. SPARQL is another.

  • We’re very exited by OData, and we have added support to our CMS. We’re seeing a lot of different uses for it. But the best thing is how easy it is to use.

    Microsoft recently demonstrated our CMS at the Mix conference as an example on how OData can open up the application silos. You can see the video here: