Among the elmcity hubs that started up last week were Santa Rosa, Calif. and Bellingham, Wash.. Both towns’ local newspapers, the Press Democrat and the Bellingham Herald, use a service called Zvents to manage their online calendars. The curators who started the Santa Rosa and Bellingham hubs, Sean Boisen and Tim Sawtell, wondered if they could subscribe these hubs to iCalendar feeds from Zvents.
At first glance the answer was yes. On the Press Democrat’s site, for example, if you view the Arts & Crafts category, you’ll find this encouraging cluster of icons and links:
An iCalendar feed? Sweet! But alas, while that “Save as iCal” does yield an iCalendar response, it’s an empty shell:
Why? Beats me, if someone knows I’d love to hear the explanation. Meanwhile, what about the corresponding RSS feed? I wasn’t hopeful. In my work on the elmcity project I often see an error of the sort I discussed in “Developing intuitions about data.” People tend to conflate the purposes of an RSS feed, which typically conveys headlines and links to people, and an iCalendar feed, which conveys dates and times to computers. This category error is so common that I’ve enshrined it in a slide I’ve used in several recent talks.
But I opened up the Press Democrat’s RSS feed anyway, and here is what I found:
Even if you know about such things as XML, RSS, and xCal, pretend for a moment that you don’t. Anyone can see that there is structure here: <xCal:dtstart>2010-11-20 10:00:00 +0000</xCal:dtstart>. That makes this feed very different from most RSS feeds that purport to represent calendar events, which typically look like this:
We humans have no trouble understanding Sat, Nov 20 10:00a. The year is omitted but we know what’s meant. Likewise we can parse a wide range of alternatives, such as Saturday, November 20, at 10:00. Does that mean AM or PM? We just know that it’s AM; a home tour wouldn’t start on Saturday at 10PM. Conversely we just know that a blues band wouldn’t start playing on Saturday at 10AM.
Since we aren’t aware that we hold this tacit knowledge, it doesn’t occur to us that computers lack it, or that as a result they require explicit rules and structure. But if you want your data to syndicate around the web, you’ve got to provide rule-based structure. Since iCalendar is the most ubiquitous format for event data, that’s currently the best way to do it. Here’s that same event in iCalendar:
SUMMARY:Event: JLNS Holiday Home Tour & Winter Market at Friedman
Event Center, Sat, Nov 20 10:00a
A point that technologists often miss, when we fight religious wars amongst ourselves about competing formats — RSS versus Atom, iCalendar versus xCalendar, and so on — is that the existence of structure matters far more than the kind of structure. Fig. 1 and Fig. 3 are two species within the same genus. Fig. 2, though, belongs to another phylum altogether. If you’re using the method shown in Fig. 2 to syndicate your data on the web, you’re doing it wrong. That RSS feed is no more useful for the purpose than a PDF file, or an HTML file.
When I realized that Zvents produces RSS+xCal feeds, and that multiple newspaper sites rely on Zvents, I added support for that format to the elmcity service. A translator reads RSS+xCal and writes iCalendar. Because the Zvents flavor of RSS+xCal is well structured, it was trivial to create that translator.
This new feature for elmcity hubs creates some interesting opportunities. For example, since each Zvents feed is the result of a query, the set of these RSS+xCal feeds is unbounded. Here’s one kind of query used on the Press Democrat’s events page; it lists events in the “Dance” category.
We can easily transform that URL into one that yields the corresponding RSS feed:
- query — the feed is the output of an open-ended search
- data structure — a structured representation of the search is available as RSS+xCal
- transformation — from RSS+xCal to iCalendar
- abstraction and generalization – what works for one category works for all
Even more is possible. Suppose you’re a grief counselor in Santa Rosa, and you would like to provide your clientele with a comprehensive list of support resources. Here’s a useful search:
It yields two recurring events for two different support groups at Hospice By The Bay.
|Free Hospice By The Bay Drop-in Group Supports Newly Bereaved
Join others who are beginning the journey through grief at a free, ongoing, drop-in …
12:00p to 1:00p
(repeats 9 times)
Hospice By The Bay,
|Hospice By The Bay Support Group for Spousal/Partner Loss
Hospice By The Bay offers an eight-week support group to help adults who have lost …
10:00a to 11:30a
Hospice By The Bay,
Here’s a transformation of that search URL that yields a RSS+xCal data feed:
That feed can now be further transformed into an iCalendar feed and included in an elmcity hub, or in any other cloud-based service or device-based app that reads iCalendar feeds. If you wanted to create a bereavement category in an elmcity hub you’d be off to a great start! But where else would you look? There’s plenty of information about public events on the web today. But only a tiny fraction of it exists as structured data that can flow through syndicated networks. Most of it lives in PDF files, or HTML files, that are only valuable to people who find their way to the sites that serve up those files.
In an effort to visualize this iceberg of unstructured information below the waterline of the data web, I added a feature to the elmcity service that searches for recurring events. It works by looking for the kinds of phrases that we humans use in our discourse: first Monday of every month at 9PM or 2nd and 4th Tuesday, 6:15-7:45 pm. In this week’s companion article I show how that search harvests pages containing these terms from Google and Bing. Here, let’s consider a few of the 3,500 items found when running that kind of search for Santa Rosa:
Investigating the fourth item, North Bay Bereavement and Grief Support Programs, we find a bunch of events represented in an unstructured way:
I’m sure the Press Democrat would love to include these events on its calendar. It can’t, though, because there’s only one way for Sutter VNA and Hospice to get its support group meetings onto the Press Democrat’s calendar. Somebody has to log into the site and input the data.
That model has never worked well, and it never will. The folks at Sutter VNA and Hospice only want to input that information once, on their own website. And that’s all they should be expected to do! Their site ought to be the authoritative source for both human-readable information about events and machine-readable data that can syndicate to the Press Democrat or to any other site that needs it.
Unfortunately the Sutter VNA folks don’t know about this dual possibility, and don’t realize that they could achieve it using Google Calendar, or Hotmail Calendar, or any other single source of human-readable text and machine-readable data about public events.
Likewise, the Press Democrat does not realize that it could subscribe to a data feed from Sutter VNA, once, and thereafter automatically receive a stream of data as comprehensive and accurate as the authoritative source wishes to provide.
This model for collective information management relies on principles that computational thinkers know and apply, including:
- pub/sub — the communication pattern is publish/subscribe
- indirection — event data is passed by reference, not by value, from publisher to subscriber
- syndication — a loosely-coupled network of publishers and subscribers
How might we teach these kinds of principles to the Sutter VNAs and Press Democrats of the world? Maybe we can start by teaching them to the kids we think are digital natives, but who don’t actually learn these principles — because we haven’t formulated them and don’t teach them.
If you teach in a middle school or a high school, here’s an interesting civics lesson you could try. Spin up an elmcity hub for your town, point kids at the unstructured iceberg revealed by the search feature, and show them how to use a service like Google Calendar or Hotmail Calendar to convert unstructured event information into structured event data that can syndicate through the hub.
The task can easily be parallelized by carving the list of search results into chunks, and assigning chunks to individual students or teams of students. Working together they should soon be able to produce a substantial calendar of events that won’t appear in any existing online directory. That calendar will be both a valuable civic contribution and a lesson in underlying principles.
For extra credit, have the students engage with the sources and explain the principles to them. The script might go like this:
Dear Mr. Jones,
We’re students at the Jefferson Middle School, and we’re working on a class project to improve the amount and quality of online event information for our community. We noticed that the following information is available on your website: [EXAMPLES].
However, these events aren’t published in a form that enables them to show up automatically elsewhere — for example, on the Herald’s site, or the Chamber of Commerce site, or on people’s personal calendars. To show how that can work, we have reformulated your information as a data feed. You can see it merged together with other data feeds here: [EXAMPLE].
This is just a demonstration. We’re not the appropriate source for your data, you are. As part of our class project, we’re reaching out to organizations like yours to show them how they can publish their own event information in two ways: as text for people to read, and also as data for computers to process and for networks to syndicate.
We know that sounds complicated, but it’s really just a way of applying the ordinary calendar software that you probably already have and use. May we contact the person in your organization who’s responsible for the events page on your website, and make a presentation about how you could be publishing event information in a more useful way?
Kayla Smith, Tim Miller, Samantha Williams
Jefferson Middle School Civic Data Project
If you’re not a teacher yourself, but you know teachers who might like to try this project-based exercise in civic data gathering and computational thinking, by all means invite them to contact me. I’ll be happy to help set up the exercise, support it, and document the outcome.