Two tools we need to improve online information

Everybody in the computer field recognizes that documentation
is moving from print publications, bought and sold in the traditional
fashion, to free Web content. But few people have looked at the
implications for tools. As part of my
research into free online content,
I’ve discovered the need for two innovations that could spur dramatic
improvements.

Note that these tools go beyond free software, or even computer
documentation, and could enhance any online content created by
a wide range of individuals.

Quiz production

Let me ask this of anyone who writes documentation: How good is it?
Think it’s pretty good? How can you tell?

Don’t depend entirely on recommendations or answers to “How useful was
this document?” polls. People sometimes recommend documentation even
though it’s not very good. It might be the only available document
concerning a topic. Until something better comes along, readers might
not even realize that a better treatment is possible. Or sometimes,
people who know a lot of background about a topic think the document
is clear when, in fact, it could be totally impenetrable to less savvy
readers (the ones who really need it).

The question of quality comes down to effectiveness.
This question haunts the field of professional editing in particular.
How can we prove that our efforts are worth the money?

Good editing untangles confusing passages, brings out hidden
background information, and eliminates annoying redundancies as well
as irrelevant content. But–does it matter to the reader? Do the
changes just make the document look better, or do they produce more
capable users in reduced time? Even if the old document took a little
more time to read and puzzle over, maybe the difference isn’t worth
the extra time or money.

Users of free online documentation take all the problems in stride.
Everybody knows that most documentation is hard to follow, but they
just put in extra effort. Some virtual elbow grease, invested in
playing around with the software, may fill the gap between what the
documentation says and how the software actually works. I learned one
tool through what I call “documentation through error messages,”
during which I deliberately wrote bizarre code, read the resulting
error messages, and built up an understanding of the role played by
each argument in each function.

If we ask authors to spend extra time polishing
documentation–especially documentation that’s likely to change
rapidly as the project evolves–we need to demonstrate that readers
benefit substantially. If we bring in professional authors and
editors, we need the evidence even more.

That’s where quizzes help.

Imagine reading a document about some software you want to use, and
coming to a multiple-choice question at the end of each page. The
question doesn’t ask you to regurgitate the facts (because you can
easily scroll back and find the answers), but tests a deep
understanding of the concepts you need to use the software
effectively. Click on an answer, and the JavaScript-backed quiz sends
it to a server that immediately returns a message telling you whether
you’re right.

Many people would be happy to spend a few seconds answering such
questions. Stored in a database, the answers reveal to the author how
effective the document is. And the data culled from multiple users
allows success to be evaluated at many levels:

If a site recruits authors, the questions can give the administrators
an idea whether they should continue employing the authors.
A string of failed quizzes can alert an author to the need to fix a
documents.
The most interesting value emerges when documents are upgraded. After
bringing in an expert author or editor, the site administrator can
compare the number of correct answers before and after the edits. Now
we have demonstrable measurements of the value of professionalism.

We may well find that some types of polishing are of little value.
Perhaps what works in more traditional publishing contexts doesn’t
work in online computer documentation. For instance, superficial
formatting changes or a couple extra definitions might do more to help
the reader then intensive work on style.

Sites might use quizzes like this if they were easy to generate. Thus,
the field of documentation needs an application that accepts questions
and a collection of possible answers. The application emits a
collection of HTML and JavaScript that the author can add to the page.
Because there are many JavaScript frameworks and the author may
already be using one in the page, the application should have a
variety of back-ends, allowing the author to use whatever framework is
already in place.

The resulting code also needs to record the user’s answer in a
database, which could be stored on a server that offers
password-protected access to authors and administrators. They can then
log in and generate reports that show the percentage of right answers
and how different versions of a document stack up. I have a prototype
of this system that I demonstrate at conferences.

Admittedly, good questions are hard to write. The author might not be
the best person to think up questions, because the author’s choices
might reflect what’s on the surface of the document rather more
crucial underlying concepts. In any case, wording is critical–because
a single poorly chosen word could render a question ambiguous or
confusing–so quiz questions should be tested on a sample of the
readership. Luckily, each document can be tested reliably with just a
handful of questions.

Cross-reference management

The amount of content you can get online is unimaginably huge. The
generosity of the public is copious, and popular software often draws
dozens of explanations, all posted freely.

In fact, the new tragedy of the commons is oversupply. Every
week or so I hear someone musing, “I ought to blog more.” These people
have heard that, in an increasing number of fields, you don’t exist
for employers or potential collaborators unless you have a powerful
online presence. I understand this too and encourage these colleagues
to blog, but I ask, “What do you have to say?” For all too many
bloggers, that’s not a criterion.

Of course, the oversupply of content is just the flip side of an
undersupply that we know as the traditional tragedy of the
commons. The undersupply in this case is our time and attention.

This abundance of blogs, web pages, and mailing lists, along with the
decentralization of the Internet, makes content discovery hard.
Reputation and rating systems, which I’ve dealt with in other
articles, might provide part of the solution–if we can develop
reliable systems and get people to use them. But before that, we have
to make sure readers can find the documents they need.

Even when found, documents may be of little use because the reader
lacks the background needed to understand it. Often a reader gives up
after finding that the document requires knowledge of an unfamiliar
tool, or after trying out a procedure that doesn’t work because the
reader is expected to alter the procedure to match his or her
particular working environment.

Some sites contain pointers to prerequisites and follow-up documents.
They may do this formally (through lists that appear near the top of
the page) or informally (“If you need more information on this topic,
read…”). But we could streamline the whole process by:

Making it easy for readers to suggest prerequisites and follow-up
documents
Generating paths through documents so the potential reader has an
entire syllabus

The first goal might be implemented like this: an author puts a form
at the end of a document, requesting cross-references. A reader can
enter a URL and a topic that it covers. The form can also indicate
whether the document should be read before the current one (in other
words, it’s more introductory) or after (in other words, it’s more
advanced). The author ultimately evaluates the suggestion; software
assistance should also be available to make it easy to include a link
in a document in a standard format. Finding the cross-references may
also be a task where publishers can add value to a community.

The second goal requires a protocol–comparable to RSS or Atom–that
can be used by tools to crawl documents and produce simple graphs. For
instance, a document about about web page layout might refer back to
pages about HTML, the DOM model, and CSS. The document about the DOM
model might refer to pages about XHTML, which in turn refers back to
pages about XML. Tags or keywords associated with each link (such as
“DOM”) allow users to associate multiple pages with a topic, providing
alternative documents and paths for people interested in that topic.

The crawler finds the trail among the documents and generates a list
of possible paths for the reader to follow. Now the reader can choose
whatever background he or she needs. One web designer wants to stick
to simple HTML, while another wants a more robust page conforming to
the DOM model. Each can find a good starting point.

Projects, social networks, and self-organizing communities could
maintain portals, and perhaps even deploy their own crawlers, that
record recommended documents and display the paths between them.

When document A adds a link to document B, the author of document B
should receive a ping asking him or her to reciprocate. A link that
both sides agree on is considered much more reliable than a link made
by just one side. A lot of authors would like to link to popular
documents, hoping to ride on its coattails. So if the author of
document B says, “yes, make this link,” it’s considered to be highly
reliable. If the author just fails to respond, the link should still
receive some consideration but be rated as less reliable. And if the
author says, “No, this link is not appropriate,” crawlers should
reject it.

Thus, the protocol has relevance ratings attached to each link, and
these can be displayed by crawlers and the portals that include the
resulting paths. In addition to relevance assigned by authors on both
sides of the link, ratings can be influenced by direct user comments
and by records of the number of times the links were followed. One
could consider multiple scales for rating documents (such as quality,
relevance, and degree to which they are up-to-date) but such plans
could quickly become too complex for average readers.

Designing tools and protocols for generating documentation paths could
be complex, but the payoffs are incalculable. The rowdy, disorganized
web could reclaim one of the key advantages of a book–its logical
organization. Readers would no longer be lost in hyperspace.

Furthermore, the system would encourage more documentation. Good
authors could expect to get more readers, garnered through appropriate
links from popular documents. Gaps in documentation would also be
revealed, and aspiring authors could make themselves useful by
focusing on places where the need is greatest.

Community members’ efforts to inform themselves and their peers are
vibrant and exciting. The energy and dedication of contributors is
beyond doubt. But we are not making the most of their talents and
efforts. The field has reached the point where it needs some formal
tools and practices to move forward and be a true educational resource
for the twenty-first century.