• Print

The laws of information chemistry

Data will flow and recombine, or not, according to principles we teach.

In the course of my work on the elmcity project I’ve talked to a lot of people about forming networks of calendars. One of the major hurdles has been the very idea that we can form such networks, in an ad-hoc way, using informal contracts. Later in this series I’ll explore why that’s a tough concept, and mull over how we might soften it up. Here I’ll focus on an even more basic conceptual stumbling block: information structure.

Everybody learns that things in the physical world are structured in ways that govern how they can or cannot interact. Whether it’s proteins folded into biochemical locks and keys, or metallic parts formed into real locks and keys, we know the drill. The right shape will open the door, the wrong one won’t. You can’t get through grade school without being exposed to that idea.
</p

Unless you’re on an IT track, though, you’ll likely graduate from college without ever learning this corollary: The right information structures open doors, the wrong ones won’t.

My project has shown me that many otherwise well-educated professionals have no intuitive sense of the differences between these various representations of a calendar:

  1. As a PDF file
  2. As an HTML page
  3. As an RSS feed
  4. As an iCalendar feed

These are all just different flavors of computer files, most people think. Pick a format that can be read on a PC or a Mac and you’re good to go. So my local high school, for example, uses PDF:

It irks me the school publishes this data without acknowledging that it is data, or providing it in a way that’s appropriate for the kind of data it is. In 2010, one of the “tools to succeed in a diverse and interdependent world” has to be a basic working knowledge of information chemistry. The quotation about learning that appears at the top of that image speaks to the underlying principle:

“Treat it as an active process of constructing ideas, rather than a passive process of absorbing information.” – Daniel J. Boorstin

I looked for that quotation’s context, by the way, and didn’t find it in any of Boorstin’s works. Instead it shows up here:

Anyway, I agree with the authors of “From Risk To Renewal: Charting a Course for Reform.” We don’t just passively dwell in social information networks, we actively co-create them. To do that effectively we need to know what will or won’t catalyze a chemical reaction in data space.

The reaction I hope the elmcity project will help catalyze is one that unlocks calendar data and enables it to flow freely through networks without loss of fidelity. In theory, any of the four document flavors listed above could work, if supported by tools that encode and exchange the core structure of an event: a title, a date and time, a link to the authoritative source. In fact there’s only one common flavor that preserves that structure: iCalendar. And there’s only one widely-deployed kind of application, examples of which include Google Calendar, Microsoft Outlook, Apple iCal, and Lotus Notes.

Among technical folk there’s been an on-again, off-again effort to migrate the iCalendar standard from its existing plain text format to one based on XML, which didn’t exist when iCalendar was born. For me it’s a wash. Either flavor can encode the basic facts in a way that enables calendar networks to form. Will translation between the flavors be a problem? It shouldn’t be, but if so I’d regard it as a good problem to have as compared to the one we’ve actually got, which is that nearly all the calendar information available online isn’t in any calendar format. It’s randomly dumped into PDF files, or into HTML pages that don’t (as they might) encode event structure using the hCalendar microformat.

The calendar-like HTML page is so common that a service called FuseCal tried (with pretty good success) to scrape those web pages and turn them into standard iCalendar feeds. The service is gone now, and one piece of the elmcity project (which I describe in the companion how-to article “How to write an elmcity event parser plug-in“) aims to recreate it in a modest way. I’m ambivalent about doing this, though, because web-page scraping sweeps the real problem under the rug. Of course we can’t expect people to read and write raw data-exchange formats. But we can and should expect people to have a clue about what data-exchange formats are, and to know something about when and why to use them.

There has been progress. Starting with the early blogosophere, and continuing into the present era of Facebook and Twitter, the technocracy has introduced the masses to the concept of information feeds. Many people now know, in a general way, that some molecular strands of information combine more readily than others. But the concept isn’t yet fully digested. So, for example, events pages on websites are far more likely to link to RSS or Atom feeds than to iCalendar feeds. With apologies to Guy Kawasaki and Terence Trent D’Arby, that’s the Right Thing done the Wrong Way.

Here’s why: Publishing a data feed is absolutely the right idea, but using RSS or Atom feeds to do it is a category error. Because these feeds don’t encode dates, times, and locations in any standard way, they’re part of the blogosophere but can’t flow through calendar networks.

Calendars, of course, are just one of many types of data that can drive online chemical reactions. We’re reaching a consensus that open publication of data is a necessary condition. But it’s not sufficient. We’ve always expected educated citizens to know at least basic physics and chemistry. Now we need to discover, write down, and teach the analogous laws that govern social information networks.

Related:

tags: , , ,
  • Isleshire

    I am not at all sure that I understand what is being said here; except that the local chamber of commerce is always sending me email with date, time, location and action in a PDF format that I can’t capture and paste into my Outlook Calendar. They are trying to communicate, but failing because their methods are a kludge.

    • http://radar.oreilly.com/jonu Jon Udell

      You understand perfectly. I would rephrase “methods are a kludge” however. I’d say their mental toolkit is missing basic principles, through no fault of their own, because we haven’t yet formulated or taught those principles in a mainstream way.

  • Steffan Antonas

    Jon,

    Do you think that ‘Doing The Right Thing The Wrong Way’ is mostly a cultural/habit problem? It seems so. It doesn’t seem like we need to invent a new technological solution – we’ve got the means to “Do The Right Thing The Right Way” we just need people to decide to get outside their comfort zone (RSS feeds etc) and use a format that works when they click publish.

    The decision point for most people publishing this type of information comes when they ask themselves “how do I publish/link this information quickly in a way that I’m familiar with, that’s easily understandable and available to me and others”. If you can take that sentence, and remove “that I’m familiar with”, what you get is the true problem that needs to be solved (i.e. how to we make it easy for people to publish in formats that are highly usable).

    What (if anything) is elmcity doing to force/encourage the right human behaviors + formatting into people’s current processes? By that I mean giving people options during their EXISTING workflows like the action of clicking “publish as iCalendar feed” on major existing platforms? If you give people an easy option that fits in their existing systems (“publish as calendar”) they’ll use it, and they’ll learn the benefits of doing so over time. Publishers become agents for the cause that subsequently educate readers and change the culture over time that way.

    Has any effort been made to partner with platforms like WordPress or Typepad (just to name a few common ones) to make creation of those types of feeds easier as part of the standard RSS/HTML workflow? Just wondering.

    • http://radar.oreilly.com/jonu Jon Udell

      “I mean giving people options during their EXISTING workflows like the action of clicking “publish as iCalendar feed” on major existing platforms?”

      I chose the calendar space as a laboratory in which to work out the larger ideas because the existing and popular workflows already do the right thing. Google Calendar is a great example. If you embed a GCal widget on a web page, as so many people do, you are also automatically publishing an iCalendar feed. But almost nobody knows a) that it happens, and b) what it means.

      We tell people to publish feeds, so they dutifully do, but the RSS/iCalendar category error shows that core underlying concepts are missing:

      - A calendar is a data set

      - Data has structure

      - When structures align, information can flow through networks

      - Networks of people form in relation to networks of data

      I don’t know whether to call this computational thinking, or systems thinking, or the science of networked information, or something else. But I think that whatever we call it, this stuff will be as fundamental from now on as reading, writing, and arithmetic. We need to codify the principles and teach them to everyone.

  • Kelly J. Cooper

    I’ve floated in and out of IT since 1989 and one thing I’ve experienced over and over is that regular people (i.e., non-technical types or people who are technical but in a non-CS field) will not do something or use something until it’s (1) easy and (2) available.

    This applies to problems other than those in technology, but the contrast in technology is often stark (and therefore easier to discuss). Fear of the unknown and barrier to entry issues are both surmountable when a given thing is easy & available.

    For instance, if you said to the High School, “Can you give that to me in raw text as well as PDF?” they’d probably be willing as no one composes IN Adobe product-land. They probably composed it in Word or Excel where it’s easy to export a txt version.

    If you then you made a program that interpreted the raw data & turned it into an online calendar (and tweaked it til you worked out the bugs), you could easily train the High School admin folks to always email a text version to a specific address that would automatically grind out an online calendar.

    They’d probably even add a clickable link to the online calendar at the bottom of the PDF. Someday the PDF might even get superseded by the online calendar, but not until all the folks who like a nicely formatted printed calendar for the fridge have moved on.

    • http://radar.oreilly.com/jonu Jon Udell

      “Fear of the unknown and barrier to entry issues are both surmountable when a given thing is easy & available.”

      Agreed. In this case, the given thing is easy and available: Google Calendar, Microsoft Outlook, etc. That’s what makes this such an interesting test case!

  • Kris Tuttle

    This post makes me think that until we incorporate a standard course on information processing alongside chemistry and physics progress will be slow.

    Some people may say “we don’t want to teach everyone programming” but that’s not the point at all. A basic course in information processing would help everyone to be prepared for what is increasingly a world of online information.

    I’ve been frustrated for years because in so many cases data and information is produced and published in formats that make it impossible to automate, build on and address more advanced use cases.

    Good post, great topic, I wish it got more productive focus in the community.

    • http://radar.oreilly.com/jonu Jon Udell

      “This post makes me think that until we incorporate a standard course on information processing alongside chemistry and physics progress will be slow.”

      I would go farther and say that there is a grammar of networked information systems that’s so basic we ought to teach it in grade school along with the three Rs. It would teach the properties of, for example:

      - structured data

      - indirection

      - globally unique names

      - pub/sub communication patterns

  • Peter Smith

    Going back to the Boorstin quote (or not). I wonder if it can’t be correctly found because its not in the web (but it maybe in the library).

    Anyway, there is also something here about being see what is in our network .. and what we are blind to (or how much effort one might put into the network).

  • bowerbird

    jon said:
    > My project has shown me that many
    > otherwise well-educated professionals have
    > no intuitive sense of the differences between
    > these various representations of a calendar:
    > As a PDF file
    > As an HTML page
    > As an RSS feed
    > As an iCalendar feed
    > These are all just different flavors of
    > computer files, most people think.
    > Pick a format that can be read on :
    > a PC or a Mac and you’re good to go.

    the problem is not with the users of the tools…
    the problem is with the _designers_ of the tools.
    they haven’t given people a tool that’s easy to use
    and satisfies _all_ the goals of those various formats.

    but even _information_science_professionals_ haven’t
    cracked this nut. show me the document file-format
    which is _easy_ for naive users to create and edit, yet
    will still create highest-quality .pdf and .html output,
    _and_ e-books compatible with the kindle and .epub.

    so, when the highly-paid specialists who _create_
    and _catalog_ books are still operating in the dark,
    is it a surprise that the masses are not enlightened?

    don’t get me wrong. i think it would be wonderful
    if ordinary people understood the structural nature
    of the information that they are creating, i sure do.

    but until we have achieved that pedagogical triumph,
    i think we should build tools for them that manage to
    _elicit_ the structure from ‘em without their complicity.

    -bowerbird

  • Phil Earnhardt

    I do have a quibble with one of your points. Are the doors really shut? Aren’t you really pointing to an opportunity for a start-up to open the doors on wrongly-published information?

    What if there were an easy way to bring some fluidity to the data in a PDF file?

  • Warren Whitlock

    loved the Boorstein quote and reference link.

    But I have to ask… is it really the goal of eduction to create a learning environment? Seems so much of it is politics and discipline.. how could we ever expect them to encourage thinking?

  • http://publicmeetings.info Steven Clift

    What about trying to help the “networks” of those who produce local calendars nationally to use the right tools.

    So for example, what if the national US Chamber of Commerce could offer a calender app to the thousands of local chambers that maintain community event calenders, the American Library Association, the National League of Cities, Lions, Rotatry, National School Public Relations Association, etc..

    More simply, they might share a feature comparison chart on existing tools with their members with a call for calendars that “flow freely through networks without loss of fidelity.”

    With our http://publicmeetings.info convening, we are particularly interested in that concept coming to government meetings along with agenda information.

    Steven Clift
    E-Democracy.org