The laws of information chemistry

Data will flow and recombine, or not, according to principles we teach.

In the course of my work on the elmcity project I’ve talked to a lot of people about forming networks of calendars. One of the major hurdles has been the very idea that we can form such networks, in an ad-hoc way, using informal contracts. Later in this series I’ll explore why that’s a tough concept, and mull over how we might soften it up. Here I’ll focus on an even more basic conceptual stumbling block: information structure.

Everybody learns that things in the physical world are structured in ways that govern how they can or cannot interact. Whether it’s proteins folded into biochemical locks and keys, or metallic parts formed into real locks and keys, we know the drill. The right shape will open the door, the wrong one won’t. You can’t get through grade school without being exposed to that idea.

Unless you’re on an IT track, though, you’ll likely graduate from college without ever learning this corollary: The right information structures open doors, the wrong ones won’t.

My project has shown me that many otherwise well-educated professionals have no intuitive sense of the differences between these various representations of a calendar:

  1. As a PDF file
  2. As an HTML page
  3. As an RSS feed
  4. As an iCalendar feed

These are all just different flavors of computer files, most people think. Pick a format that can be read on a PC or a Mac and you’re good to go. So my local high school, for example, uses PDF:

It irks me the school publishes this data without acknowledging that it is data, or providing it in a way that’s appropriate for the kind of data it is. In 2010, one of the “tools to succeed in a diverse and interdependent world” has to be a basic working knowledge of information chemistry. The quotation about learning that appears at the top of that image speaks to the underlying principle:

“Treat it as an active process of constructing ideas, rather than a passive process of absorbing information.” – Daniel J. Boorstin

I looked for that quotation’s context, by the way, and didn’t find it in any of Boorstin’s works. Instead it shows up here:

Anyway, I agree with the authors of “From Risk To Renewal: Charting a Course for Reform.” We don’t just passively dwell in social information networks, we actively co-create them. To do that effectively we need to know what will or won’t catalyze a chemical reaction in data space.

The reaction I hope the elmcity project will help catalyze is one that unlocks calendar data and enables it to flow freely through networks without loss of fidelity. In theory, any of the four document flavors listed above could work, if supported by tools that encode and exchange the core structure of an event: a title, a date and time, a link to the authoritative source. In fact there’s only one common flavor that preserves that structure: iCalendar. And there’s only one widely-deployed kind of application, examples of which include Google Calendar, Microsoft Outlook, Apple iCal, and Lotus Notes.

Among technical folk there’s been an on-again, off-again effort to migrate the iCalendar standard from its existing plain text format to one based on XML, which didn’t exist when iCalendar was born. For me it’s a wash. Either flavor can encode the basic facts in a way that enables calendar networks to form. Will translation between the flavors be a problem? It shouldn’t be, but if so I’d regard it as a good problem to have as compared to the one we’ve actually got, which is that nearly all the calendar information available online isn’t in any calendar format. It’s randomly dumped into PDF files, or into HTML pages that don’t (as they might) encode event structure using the hCalendar microformat.

The calendar-like HTML page is so common that a service called FuseCal tried (with pretty good success) to scrape those web pages and turn them into standard iCalendar feeds. The service is gone now, and one piece of the elmcity project (which I describe in the companion how-to article “How to write an elmcity event parser plug-in“) aims to recreate it in a modest way. I’m ambivalent about doing this, though, because web-page scraping sweeps the real problem under the rug. Of course we can’t expect people to read and write raw data-exchange formats. But we can and should expect people to have a clue about what data-exchange formats are, and to know something about when and why to use them.

There has been progress. Starting with the early blogosophere, and continuing into the present era of Facebook and Twitter, the technocracy has introduced the masses to the concept of information feeds. Many people now know, in a general way, that some molecular strands of information combine more readily than others. But the concept isn’t yet fully digested. So, for example, events pages on websites are far more likely to link to RSS or Atom feeds than to iCalendar feeds. With apologies to Guy Kawasaki and Terence Trent D’Arby, that’s the Right Thing done the Wrong Way.

Here’s why: Publishing a data feed is absolutely the right idea, but using RSS or Atom feeds to do it is a category error. Because these feeds don’t encode dates, times, and locations in any standard way, they’re part of the blogosophere but can’t flow through calendar networks.

Calendars, of course, are just one of many types of data that can drive online chemical reactions. We’re reaching a consensus that open publication of data is a necessary condition. But it’s not sufficient. We’ve always expected educated citizens to know at least basic physics and chemistry. Now we need to discover, write down, and teach the analogous laws that govern social information networks.


tags: , ,