Two tools we need to improve online information

Everybody in the computer field recognizes that documentation
is moving from print publications, bought and sold in the traditional
fashion, to free Web content. But few people have looked at the
implications for tools. As part of my
research into free online content,
I’ve discovered the need for two innovations that could spur dramatic

Note that these tools go beyond free software, or even computer
documentation, and could enhance any online content created by
a wide range of individuals.

Quiz production

Let me ask this of anyone who writes documentation: How good is it?
Think it’s pretty good? How can you tell?

Don’t depend entirely on recommendations or answers to “How useful was
this document?” polls. People sometimes recommend documentation even
though it’s not very good. It might be the only available document
concerning a topic. Until something better comes along, readers might
not even realize that a better treatment is possible. Or sometimes,
people who know a lot of background about a topic think the document
is clear when, in fact, it could be totally impenetrable to less savvy
readers (the ones who really need it).

The question of quality comes down to effectiveness.
This question haunts the field of professional editing in particular.
How can we prove that our efforts are worth the money?

Good editing untangles confusing passages, brings out hidden
background information, and eliminates annoying redundancies as well
as irrelevant content. But–does it matter to the reader? Do the
changes just make the document look better, or do they produce more
capable users in reduced time? Even if the old document took a little
more time to read and puzzle over, maybe the difference isn’t worth
the extra time or money.

Users of free online documentation take all the problems in stride.
Everybody knows that most documentation is hard to follow, but they
just put in extra effort. Some virtual elbow grease, invested in
playing around with the software, may fill the gap between what the
documentation says and how the software actually works. I learned one
tool through what I call “documentation through error messages,”
during which I deliberately wrote bizarre code, read the resulting
error messages, and built up an understanding of the role played by
each argument in each function.

If we ask authors to spend extra time polishing
documentation–especially documentation that’s likely to change
rapidly as the project evolves–we need to demonstrate that readers
benefit substantially. If we bring in professional authors and
editors, we need the evidence even more.

That’s where quizzes help.

Imagine reading a document about some software you want to use, and
coming to a multiple-choice question at the end of each page. The
question doesn’t ask you to regurgitate the facts (because you can
easily scroll back and find the answers), but tests a deep
understanding of the concepts you need to use the software
effectively. Click on an answer, and the JavaScript-backed quiz sends
it to a server that immediately returns a message telling you whether
you’re right.

Many people would be happy to spend a few seconds answering such
questions. Stored in a database, the answers reveal to the author how
effective the document is. And the data culled from multiple users
allows success to be evaluated at many levels:

  • If a site recruits authors, the questions can give the administrators
    an idea whether they should continue employing the authors.

  • A string of failed quizzes can alert an author to the need to fix a

  • The most interesting value emerges when documents are upgraded. After
    bringing in an expert author or editor, the site administrator can
    compare the number of correct answers before and after the edits. Now
    we have demonstrable measurements of the value of professionalism.

We may well find that some types of polishing are of little value.
Perhaps what works in more traditional publishing contexts doesn’t
work in online computer documentation. For instance, superficial
formatting changes or a couple extra definitions might do more to help
the reader then intensive work on style.

Sites might use quizzes like this if they were easy to generate. Thus,
the field of documentation needs an application that accepts questions
and a collection of possible answers. The application emits a
collection of HTML and JavaScript that the author can add to the page.
Because there are many JavaScript frameworks and the author may
already be using one in the page, the application should have a
variety of back-ends, allowing the author to use whatever framework is
already in place.

The resulting code also needs to record the user’s answer in a
database, which could be stored on a server that offers
password-protected access to authors and administrators. They can then
log in and generate reports that show the percentage of right answers
and how different versions of a document stack up. I have a prototype
of this system that I demonstrate at conferences.

Admittedly, good questions are hard to write. The author might not be
the best person to think up questions, because the author’s choices
might reflect what’s on the surface of the document rather more
crucial underlying concepts. In any case, wording is critical–because
a single poorly chosen word could render a question ambiguous or
confusing–so quiz questions should be tested on a sample of the
readership. Luckily, each document can be tested reliably with just a
handful of questions.

Cross-reference management

The amount of content you can get online is unimaginably huge. The
generosity of the public is copious, and popular software often draws
dozens of explanations, all posted freely.

In fact, the new tragedy of the commons is oversupply. Every
week or so I hear someone musing, “I ought to blog more.” These people
have heard that, in an increasing number of fields, you don’t exist
for employers or potential collaborators unless you have a powerful
online presence. I understand this too and encourage these colleagues
to blog, but I ask, “What do you have to say?” For all too many
bloggers, that’s not a criterion.

Of course, the oversupply of content is just the flip side of an
undersupply that we know as the traditional tragedy of the
commons. The undersupply in this case is our time and attention.

This abundance of blogs, web pages, and mailing lists, along with the
decentralization of the Internet, makes content discovery hard.
Reputation and rating systems, which I’ve dealt with in other
articles, might provide part of the solution–if we can develop
reliable systems and get people to use them. But before that, we have
to make sure readers can find the documents they need.

Even when found, documents may be of little use because the reader
lacks the background needed to understand it. Often a reader gives up
after finding that the document requires knowledge of an unfamiliar
tool, or after trying out a procedure that doesn’t work because the
reader is expected to alter the procedure to match his or her
particular working environment.

Some sites contain pointers to prerequisites and follow-up documents.
They may do this formally (through lists that appear near the top of
the page) or informally (“If you need more information on this topic,
read…”). But we could streamline the whole process by:

  1. Making it easy for readers to suggest prerequisites and follow-up

  2. Generating paths through documents so the potential reader has an
    entire syllabus

The first goal might be implemented like this: an author puts a form
at the end of a document, requesting cross-references. A reader can
enter a URL and a topic that it covers. The form can also indicate
whether the document should be read before the current one (in other
words, it’s more introductory) or after (in other words, it’s more
advanced). The author ultimately evaluates the suggestion; software
assistance should also be available to make it easy to include a link
in a document in a standard format. Finding the cross-references may
also be a task where publishers can add value to a community.

The second goal requires a protocol–comparable to RSS or Atom–that
can be used by tools to crawl documents and produce simple graphs. For
instance, a document about about web page layout might refer back to
pages about HTML, the DOM model, and CSS. The document about the DOM
model might refer to pages about XHTML, which in turn refers back to
pages about XML. Tags or keywords associated with each link (such as
“DOM”) allow users to associate multiple pages with a topic, providing
alternative documents and paths for people interested in that topic.

The crawler finds the trail among the documents and generates a list
of possible paths for the reader to follow. Now the reader can choose
whatever background he or she needs. One web designer wants to stick
to simple HTML, while another wants a more robust page conforming to
the DOM model. Each can find a good starting point.

Projects, social networks, and self-organizing communities could
maintain portals, and perhaps even deploy their own crawlers, that
record recommended documents and display the paths between them.

When document A adds a link to document B, the author of document B
should receive a ping asking him or her to reciprocate. A link that
both sides agree on is considered much more reliable than a link made
by just one side. A lot of authors would like to link to popular
documents, hoping to ride on its coattails. So if the author of
document B says, “yes, make this link,” it’s considered to be highly
reliable. If the author just fails to respond, the link should still
receive some consideration but be rated as less reliable. And if the
author says, “No, this link is not appropriate,” crawlers should
reject it.

Thus, the protocol has relevance ratings attached to each link, and
these can be displayed by crawlers and the portals that include the
resulting paths. In addition to relevance assigned by authors on both
sides of the link, ratings can be influenced by direct user comments
and by records of the number of times the links were followed. One
could consider multiple scales for rating documents (such as quality,
relevance, and degree to which they are up-to-date) but such plans
could quickly become too complex for average readers.

Designing tools and protocols for generating documentation paths could
be complex, but the payoffs are incalculable. The rowdy, disorganized
web could reclaim one of the key advantages of a book–its logical
organization. Readers would no longer be lost in hyperspace.

Furthermore, the system would encourage more documentation. Good
authors could expect to get more readers, garnered through appropriate
links from popular documents. Gaps in documentation would also be
revealed, and aspiring authors could make themselves useful by
focusing on places where the need is greatest.

Community members’ efforts to inform themselves and their peers are
vibrant and exciting. The energy and dedication of contributors is
beyond doubt. But we are not making the most of their talents and
efforts. The field has reached the point where it needs some formal
tools and practices to move forward and be a true educational resource
for the twenty-first century.

  • Two compelling frontiers. Thanks, Andy. Here are some small user-centric add-ons.

    1) The quizzes should be multiple choice, and each answer should prompt an explanation of why it is right or wrong. That way, the quizzes are more explicitly a review for the user, and increase her motivation to take them. (For the same reason, a user should be allowed to try every question-answer pair after the first, consequential answering.)

    2) If a user had access to a literal graph of ALL possible paths, and a way for users to quickly sample content from that graph (like a Netflix rollover for each node/button), they could blaze their own path if a suggested path did not appeal to them.


  • Rating, cross referencing and META tagging of content is crucial for us to continue organizing this mess of information we have created.

    Definitely needs to be a group effort to make sure we standardized our platforms to work together and work to assist the quality content to float to the top.

    I think some of the leaders like wordpress are helping standardize platform offerings.

  • Scott Gray

    If you haven’t read it, pick up a copy of “The Nurnberg Funnel: Designing Minimalist Instruction for Practical Computer Skill (Technical Communication, Multimedia, and Information Systems)” by John Carroll.

    Some of the things you’re suggesting sound like suggestions from that book.

    According to Carroll virtually every study done on software documentation supports “minimal documentation” which suggests that users learning software don’t like to read but instead prefer to plunge into the system. Users tend to look for information that matches what they are trying to accomplish.

    Carroll suggests that we eliminate information that gets in the way of learning. Most of the time we include too much information in manuals and documentation because we don’t know what the user needs to know or what they’re trying to do.

    Consider applying some of Carroll’s minimalist
    principles as you build a design around what the users want to accomplish, how they will proceed, and how they will make errors and recover from them:
    ‚Ä¢ Allow users to get started fast— take an action centered
    (or user-centered) approach by giving users
    enough information to get their real tasks done right
    away. Don’t try to cover every function; focus on
    the users’ actions and not the products’ functions.
    Get users engaged quickly by omitting long
    introductions and cutting down on repetition and
    ‚Ä¢ Rely on users to think and improvise—provide
    enough information so the users will explore on their own and discover solutions to specific problems.
    ‚Ä¢ Exploit what people already know—use metaphors
    and similes to help users relate and learn.
    ‚Ä¢ Support error recognition and recovery. Errors can’t be avoided, but you can provide error information that supports error detection, diagnosis, and recovery.

    Accordingly, it would be nice if the tools themselves could integrate with the type of documentation you’re talking about. For instance if I’m using Eclipse and I’d like to be able to perform a particular task, I’d like to search from within eclipse and if I find an example someone has created be able to load it directly into eclipse. Or if I had an error an automatic search of such errors would be performed and likely solutions of the type you’re discussing would be available to me.

  • Scott Gray

    The NICE thing about all this disorganized documentation is that you can almost always find someone who had the same problem you did and found a solution you can use.

    Most documentation is written from the perspective that people learn from the documentation for learning sake…they don’t…they have some particular problem they’re trying to solve and want a solution as quickly as they can get it. First they just try to figure out themselves, then they go to the internet to try to find it. For now, that solution seems to be best found by using Google.

    I don’t think people looking for solutions will want to take the time to take a quiz to see if they can solve their problem. I think they’d rather just see if they can solve their problem by solving it.

  • moya

    really intriguing ideas; thanks andy.
    regarding cross-referencing, what is the potential of social bookmarking as a plausible tool for sorting through online information? i’ve also been wondering about the potential of recent web2summit wunderkind Twine for social-networking/bookmarking/semantically scraping in order to “cross-reference” given topics along multiple axes.

  • Jane Hadley

    Great ideas here. But, as somebody who is mostly self-taught and uses online and offline documentation a lot, my opinion is that there is no substitute for excellent search, tables of content, and comprehensive, thorough indexes.

    Mr. Carroll’s hunch about people plunging into a project and looking for specific information applies to much that I do. When I go to documentation, I’m using search, tables of contents and/or indexes to quickly find what I want.

  • Thanks for the great comments. (And yes, I’ve read several articles by John Carroll, although I haven’t read his book yet.)

    Carroll inspired and influenced me, but he seemed to be testing mostly end-user systems with graphical interfaces (fairly novel for most people in the 80s) and I don’t believe the “minimal” documentation you need for most systems is very small.

    There are two ways to go with almost any hyperlink technology: ask users to make connections manually, or use a sophisticated tool, such
    as Twine appears to be. Of course, we can do both. Twine can add choices and tweak ratings.

    I know the history of the semantic web: asking users to add meta-information doesn’t work as well as developing better search engines and other tools to do the work automatically. Twine looks like it makes good use of the work users do, and builds on it.

    I think user recommendations are valuable for creating document paths. I tried to suggest a system that requires very little effort for a big positive impact. I’d like to see whether something like Twine can help too.

  • bowerbird

    sorry andy, but scott gray is directly on-target:
    find out what they want to do, then show them how
    and tell ’em how they shoulda asked the question.


  • The importance of cross-reference management is a big deal for many fields, and some of the existing organizational structures could be improved and then mapped to the appropriate cross-reference management system/methods if they were readily available. I work in a digital library center and we’re creating tons of great digital content, but then we need to make that content usable.

    The existing documentation for traditional libraries has been focused on opening the closed content, closed to physical access and within closed databases. Moving the traditional systems of organization (with so many subject guides, class guides, how-to guides, and other documentation types) into a form that works for both closed and open systems is difficult, as is figuring out how to make these modular and reusable since so many have always been developed in relation to the particularities of the physical library or the multiple closed databases. Reformatting the information on the subject area or field and then showing how the resources support the larger inquiry questions is a lot of work, and it would be ideal to have a great cross-referencing system to make this work usable for other libraries and for other fields since so much of the information could be useful for anyone. Then, the modular information could be used for traditional library and academic goals (research, class projects) and for explaining and organizing larger structures or research and data.