A lesson in civics, public data, and computational principles

The benefits of information principles are revealed through education, so let's start with digital natives.

Among the elmcity hubs that started up last week were Santa Rosa, Calif. and Bellingham, Wash.. Both towns’ local newspapers, the Press Democrat and the Bellingham Herald, use a service called Zvents to manage their online calendars. The curators who started the Santa Rosa and Bellingham hubs, Sean Boisen and Tim Sawtell, wondered if they could subscribe these hubs to iCalendar feeds from Zvents.

At first glance the answer was yes. On the Press Democrat’s site, for example, if you view the Arts & Crafts category, you’ll find this encouraging cluster of icons and links:

An iCalendar feed? Sweet! But alas, while that “Save as iCal” does yield an iCalendar response, it’s an empty shell:

BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
PRODID:Zvents Ical
END:VCALENDAR

Why? Beats me, if someone knows I’d love to hear the explanation. Meanwhile, what about the corresponding RSS feed? I wasn’t hopeful. In my work on the elmcity project I often see an error of the sort I discussed in “Developing intuitions about data.” People tend to conflate the purposes of an RSS feed, which typically conveys headlines and links to people, and an iCalendar feed, which conveys dates and times to computers. This category error is so common that I’ve enshrined it in a slide I’ve used in several recent talks.

But I opened up the Press Democrat’s RSS feed anyway, and here is what I found:

<item>
<title>Event: JLNS Holiday Home Tour & Winter Market at Friedman Event Center, Sat, Nov 20 10:00a</title>
<description>The Junior League of Napa-Sonoma presents a tour of prestigious homes in Bennett Valley, all festively decorated, with all proceeds to benefit local charities</description>
<link>http://events.pressdemocrat.com/santa-rosa-ca/events/show/139081965-jlns-holiday-home-tour-winter-market</link>
<xCal:dtstart>2010-11-20 10:00:00 +0000</xCal:dtstart>
<xCal:dtend>2010-11-20 16:00:00 +0000</xCal:dtend>
<xCal:location>http://events.pressdemocrat.com/santa-rosa-ca/venues/show/672937-friedman-event-center</xCal:location>
</item>

Fig. 1: An event in the Press Democrat’s RSS events feed

Even if you know about such things as XML, RSS, and xCal, pretend for a moment that you don’t. Anyone can see that there is structure here: <xCal:dtstart>2010-11-20 10:00:00 +0000</xCal:dtstart>. That makes this feed very different from most RSS feeds that purport to represent calendar events, which typically look like this:

<item>
<title>Event: JLNS Holiday Home Tour & Winter Market at Friedman Event Center, Sat, Nov 20 10:00a</title>
<description>The Junior League of Napa-Sonoma presents a tour of prestigious homes in Bennett Valley, all festively decorated, with all proceeds to benefit local charities</description>
<link>http://events.pressdemocrat.com/santa-rosa-ca/events/show/139081965-jlns-holiday-home-tour-winter-market</link>
</item>

Fig. 2: Same event in a typical RSS events feed

We humans have no trouble understanding Sat, Nov 20 10:00a. The year is omitted but we know what’s meant. Likewise we can parse a wide range of alternatives, such as Saturday, November 20, at 10:00. Does that mean AM or PM? We just know that it’s AM; a home tour wouldn’t start on Saturday at 10PM. Conversely we just know that a blues band wouldn’t start playing on Saturday at 10AM.

Since we aren’t aware that we hold this tacit knowledge, it doesn’t occur to us that computers lack it, or that as a result they require explicit rules and structure. But if you want your data to syndicate around the web, you’ve got to provide rule-based structure. Since iCalendar is the most ubiquitous format for event data, that’s currently the best way to do it. Here’s that same event in iCalendar:

BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
PRODID:Zvents Ical
BEGIN:VCALENDAR
BEGIN:VEVENT
DTSTART:20101120T100000
DTEND:20101120T16000000
SUMMARY:Event: JLNS Holiday Home Tour & Winter Market at Friedman
  Event Center, Sat, Nov 20 10:00a
END:VEVENT
END:VCALENDAR
Fig. 3: Same event in an iCalendar feed

A point that technologists often miss, when we fight religious wars amongst ourselves about competing formats — RSS versus Atom, iCalendar versus xCalendar, and so on — is that the existence of structure matters far more than the kind of structure. Fig. 1 and Fig. 3 are two species within the same genus. Fig. 2, though, belongs to another phylum altogether. If you’re using the method shown in Fig. 2 to syndicate your data on the web, you’re doing it wrong. That RSS feed is no more useful for the purpose than a PDF file, or an HTML file.

When I realized that Zvents produces RSS+xCal feeds, and that multiple newspaper sites rely on Zvents, I added support for that format to the elmcity service. A translator reads RSS+xCal and writes iCalendar. Because the Zvents flavor of RSS+xCal is well structured, it was trivial to create that translator.

This new feature for elmcity hubs creates some interesting opportunities. For example, since each Zvents feed is the result of a query, the set of these RSS+xCal feeds is unbounded. Here’s one kind of query used on the Press Democrat’s events page; it lists events in the “Dance” category.

http://events.pressdemocrat.com/search?cat=4&st=event

We can easily transform that URL into one that yields the corresponding RSS feed:

http://events.pressdemocrat.com/search?cat=4&st=event&rss=1

Observing this, Tim Sawtell was able to merge a set of categorized feeds into the Santa Rosa hub. In doing so, he illustrated a number of key principles that computational thinkers know and apply:

  1. query — the feed is the output of an open-ended search
  2. data structure — a structured representation of the search is available as RSS+xCal
  3. transformation — from RSS+xCal to iCalendar
  4. abstraction and generalization – what works for one category works for all

Even more is possible. Suppose you’re a grief counselor in Santa Rosa, and you would like to provide your clientele with a comprehensive list of support resources. Here’s a useful search:

http://events.pressdemocrat.com/search?swhat=bereavement

It yields two recurring events for two different support groups at Hospice By The Bay.

Free Hospice By The Bay Drop-in Group Supports Newly Bereaved
Join others who are beginning the journey through grief at a free, ongoing, drop-in …
10/26/2010 Tuesday
12:00p to 1:00p
(repeats 9 times)
Hospice By The Bay,
Sonoma CA
Hospice By The Bay Support Group for Spousal/Partner Loss
Hospice By The Bay offers an eight-week support group to help adults who have lost …
10/26/2010 Tuesday
10:00a to 11:30a
Hospice By The Bay,
Sonoma CA
Fig. 4: Bereavement support group meetings in Santa Rosa, via the Press Democrat

Here’s a transformation of that search URL that yields a RSS+xCal data feed:

http://events.pressdemocrat.com/search?swhat=bereavement&rss=1

That feed can now be further transformed into an iCalendar feed and included in an elmcity hub, or in any other cloud-based service or device-based app that reads iCalendar feeds. If you wanted to create a bereavement category in an elmcity hub you’d be off to a great start! But where else would you look? There’s plenty of information about public events on the web today. But only a tiny fraction of it exists as structured data that can flow through syndicated networks. Most of it lives in PDF files, or HTML files, that are only valuable to people who find their way to the sites that serve up those files.

In an effort to visualize this iceberg of unstructured information below the waterline of the data web, I added a feature to the elmcity service that searches for recurring events. It works by looking for the kinds of phrases that we humans use in our discourse: first Monday of every month at 9PM or 2nd and 4th Tuesday, 6:15-7:45 pm. In this week’s companion article I show how that search harvests pages containing these terms from Google and Bing. Here, let’s consider a few of the 3,500 items found when running that kind of search for Santa Rosa:

google: 1253
bing: 2023
google_and_bing: 292

1. Hannah Caratti, Pre-Licensed Professional, Santa Rosa, CA 95404 … (google)

Every Monday at 6pm – 7pm $20 – $30 per session. Meditation & Stress Reduction Group … Chronic Pain or Illness Therapist in Santa Rosa, CA …

2. Bob Greenberg, Marriage & Family Therapist, Santa Rosa, CA 95404 … (google)

Every Monday at 12am – 12am $40+ per session. An in depth group for adult …

3. Classes at the Women's Health and Birth Center (google)

Every Monday (except for holiday Mondays). Group/walk-in from 12 noon … Women's Health and Birth Center since 1993, 583 Summerfield Road Santa Rosa, CA 95405.

4. North Bay Bereavement and Grief Support Programs (google)

Every Monday, Noon-1:30 p.m.. Back to top … 547 Mendocino Avenue, Santa Rosa, CA 95401 (Parking garage 521 7th Street) …

Fig. 5: Unstructured event data for Santa Rosa

Investigating the fourth item, North Bay Bereavement and Grief Support Programs, we find a bunch of events represented in an unstructured way:

Bereaved Parents: For parents whose young or adult child has died.
2nd and 4th Thursdays, 6:00 – 7:30 p.m.

Family and Caregiver Support Groups: For adults whose loved one has a life-threatening illness.
Every Tuesday, 4:00-5:30 p.m.

Survivors of Suicide: For those who have lost a loved one to suicide.
Every Monday, Noon-1:30 p.m.

People in Grief: For people whose loved one has died.
Every Wednesday, 6:00-7:30 p.m.

Partner Loss – Evening: For adults whose spouse or partner has died.
2nd and 4th Tuesday, 6:15-7:45 p.m.

Partner Loss – Daytime: For adults whose spouse or partner has died.
Every Wednesday, 11:00 a.m. – 12:30 p.m.

Fig. 6: Unstructured event data about bereavement support groups in Santa Rosa

I’m sure the Press Democrat would love to include these events on its calendar. It can’t, though, because there’s only one way for Sutter VNA and Hospice to get its support group meetings onto the Press Democrat’s calendar. Somebody has to log into the site and input the data.

That model has never worked well, and it never will. The folks at Sutter VNA and Hospice only want to input that information once, on their own website. And that’s all they should be expected to do! Their site ought to be the authoritative source for both human-readable information about events and machine-readable data that can syndicate to the Press Democrat or to any other site that needs it.

Unfortunately the Sutter VNA folks don’t know about this dual possibility, and don’t realize that they could achieve it using Google Calendar, or Hotmail Calendar, or any other single source of human-readable text and machine-readable data about public events.

Likewise, the Press Democrat does not realize that it could subscribe to a data feed from Sutter VNA, once, and thereafter automatically receive a stream of data as comprehensive and accurate as the authoritative source wishes to provide.

This model for collective information management relies on principles that computational thinkers know and apply, including:

  1. pub/sub — the communication pattern is publish/subscribe
  2. indirection — event data is passed by reference, not by value, from publisher to subscriber
  3. syndication — a loosely-coupled network of publishers and subscribers

How might we teach these kinds of principles to the Sutter VNAs and Press Democrats of the world? Maybe we can start by teaching them to the kids we think are digital natives, but who don’t actually learn these principles — because we haven’t formulated them and don’t teach them.

If you teach in a middle school or a high school, here’s an interesting civics lesson you could try. Spin up an elmcity hub for your town, point kids at the unstructured iceberg revealed by the search feature, and show them how to use a service like Google Calendar or Hotmail Calendar to convert unstructured event information into structured event data that can syndicate through the hub.

The task can easily be parallelized by carving the list of search results into chunks, and assigning chunks to individual students or teams of students. Working together they should soon be able to produce a substantial calendar of events that won’t appear in any existing online directory. That calendar will be both a valuable civic contribution and a lesson in underlying principles.

For extra credit, have the students engage with the sources and explain the principles to them. The script might go like this:

Dear Mr. Jones,

We’re students at the Jefferson Middle School, and we’re working on a class project to improve the amount and quality of online event information for our community. We noticed that the following information is available on your website: [EXAMPLES].

However, these events aren’t published in a form that enables them to show up automatically elsewhere — for example, on the Herald’s site, or the Chamber of Commerce site, or on people’s personal calendars. To show how that can work, we have reformulated your information as a data feed. You can see it merged together with other data feeds here: [EXAMPLE].

This is just a demonstration. We’re not the appropriate source for your data, you are. As part of our class project, we’re reaching out to organizations like yours to show them how they can publish their own event information in two ways: as text for people to read, and also as data for computers to process and for networks to syndicate.

We know that sounds complicated, but it’s really just a way of applying the ordinary calendar software that you probably already have and use. May we contact the person in your organization who’s responsible for the events page on your website, and make a presentation about how you could be publishing event information in a more useful way?

Sincerely,

Kayla Smith, Tim Miller, Samantha Williams
Jefferson Middle School Civic Data Project

If you’re not a teacher yourself, but you know teachers who might like to try this project-based exercise in civic data gathering and computational thinking, by all means invite them to contact me. I’ll be happy to help set up the exercise, support it, and document the outcome.

Related:

tags: , ,