Thu

Jan 31
2008

Andy Oram

Andy Oram

Two tools we need to improve online information

Everybody in the computer field recognizes that documentation is moving from print publications, bought and sold in the traditional fashion, to free Web content. But few people have looked at the implications for tools. As part of my research into free online content, I've discovered the need for two innovations that could spur dramatic improvements.

Note that these tools go beyond free software, or even computer documentation, and could enhance any online content created by a wide range of individuals.

Quiz production

Let me ask this of anyone who writes documentation: How good is it? Think it's pretty good? How can you tell?

Don't depend entirely on recommendations or answers to "How useful was this document?" polls. People sometimes recommend documentation even though it's not very good. It might be the only available document concerning a topic. Until something better comes along, readers might not even realize that a better treatment is possible. Or sometimes, people who know a lot of background about a topic think the document is clear when, in fact, it could be totally impenetrable to less savvy readers (the ones who really need it).

The question of quality comes down to effectiveness. This question haunts the field of professional editing in particular. How can we prove that our efforts are worth the money?

Good editing untangles confusing passages, brings out hidden background information, and eliminates annoying redundancies as well as irrelevant content. But--does it matter to the reader? Do the changes just make the document look better, or do they produce more capable users in reduced time? Even if the old document took a little more time to read and puzzle over, maybe the difference isn't worth the extra time or money.

Users of free online documentation take all the problems in stride. Everybody knows that most documentation is hard to follow, but they just put in extra effort. Some virtual elbow grease, invested in playing around with the software, may fill the gap between what the documentation says and how the software actually works. I learned one tool through what I call "documentation through error messages," during which I deliberately wrote bizarre code, read the resulting error messages, and built up an understanding of the role played by each argument in each function.

If we ask authors to spend extra time polishing documentation--especially documentation that's likely to change rapidly as the project evolves--we need to demonstrate that readers benefit substantially. If we bring in professional authors and editors, we need the evidence even more.

That's where quizzes help.

Imagine reading a document about some software you want to use, and coming to a multiple-choice question at the end of each page. The question doesn't ask you to regurgitate the facts (because you can easily scroll back and find the answers), but tests a deep understanding of the concepts you need to use the software effectively. Click on an answer, and the JavaScript-backed quiz sends it to a server that immediately returns a message telling you whether you're right.

Many people would be happy to spend a few seconds answering such questions. Stored in a database, the answers reveal to the author how effective the document is. And the data culled from multiple users allows success to be evaluated at many levels:

  • If a site recruits authors, the questions can give the administrators an idea whether they should continue employing the authors.
  • A string of failed quizzes can alert an author to the need to fix a documents.
  • The most interesting value emerges when documents are upgraded. After bringing in an expert author or editor, the site administrator can compare the number of correct answers before and after the edits. Now we have demonstrable measurements of the value of professionalism.

We may well find that some types of polishing are of little value. Perhaps what works in more traditional publishing contexts doesn't work in online computer documentation. For instance, superficial formatting changes or a couple extra definitions might do more to help the reader then intensive work on style.

Sites might use quizzes like this if they were easy to generate. Thus, the field of documentation needs an application that accepts questions and a collection of possible answers. The application emits a collection of HTML and JavaScript that the author can add to the page. Because there are many JavaScript frameworks and the author may already be using one in the page, the application should have a variety of back-ends, allowing the author to use whatever framework is already in place.

The resulting code also needs to record the user's answer in a database, which could be stored on a server that offers password-protected access to authors and administrators. They can then log in and generate reports that show the percentage of right answers and how different versions of a document stack up. I have a prototype of this system that I demonstrate at conferences.

Admittedly, good questions are hard to write. The author might not be the best person to think up questions, because the author's choices might reflect what's on the surface of the document rather more crucial underlying concepts. In any case, wording is critical--because a single poorly chosen word could render a question ambiguous or confusing--so quiz questions should be tested on a sample of the readership. Luckily, each document can be tested reliably with just a handful of questions.

Cross-reference management

The amount of content you can get online is unimaginably huge. The generosity of the public is copious, and popular software often draws dozens of explanations, all posted freely.

In fact, the new tragedy of the commons is oversupply. Every week or so I hear someone musing, "I ought to blog more." These people have heard that, in an increasing number of fields, you don't exist for employers or potential collaborators unless you have a powerful online presence. I understand this too and encourage these colleagues to blog, but I ask, "What do you have to say?" For all too many bloggers, that's not a criterion.

Of course, the oversupply of content is just the flip side of an undersupply that we know as the traditional tragedy of the commons. The undersupply in this case is our time and attention.

This abundance of blogs, web pages, and mailing lists, along with the decentralization of the Internet, makes content discovery hard. Reputation and rating systems, which I've dealt with in other articles, might provide part of the solution--if we can develop reliable systems and get people to use them. But before that, we have to make sure readers can find the documents they need.

Even when found, documents may be of little use because the reader lacks the background needed to understand it. Often a reader gives up after finding that the document requires knowledge of an unfamiliar tool, or after trying out a procedure that doesn't work because the reader is expected to alter the procedure to match his or her particular working environment.

Some sites contain pointers to prerequisites and follow-up documents. They may do this formally (through lists that appear near the top of the page) or informally ("If you need more information on this topic, read..."). But we could streamline the whole process by:

  1. Making it easy for readers to suggest prerequisites and follow-up documents
  2. Generating paths through documents so the potential reader has an entire syllabus

The first goal might be implemented like this: an author puts a form at the end of a document, requesting cross-references. A reader can enter a URL and a topic that it covers. The form can also indicate whether the document should be read before the current one (in other words, it's more introductory) or after (in other words, it's more advanced). The author ultimately evaluates the suggestion; software assistance should also be available to make it easy to include a link in a document in a standard format. Finding the cross-references may also be a task where publishers can add value to a community.

The second goal requires a protocol--comparable to RSS or Atom--that can be used by tools to crawl documents and produce simple graphs. For instance, a document about about web page layout might refer back to pages about HTML, the DOM model, and CSS. The document about the DOM model might refer to pages about XHTML, which in turn refers back to pages about XML. Tags or keywords associated with each link (such as "DOM") allow users to associate multiple pages with a topic, providing alternative documents and paths for people interested in that topic.

The crawler finds the trail among the documents and generates a list of possible paths for the reader to follow. Now the reader can choose whatever background he or she needs. One web designer wants to stick to simple HTML, while another wants a more robust page conforming to the DOM model. Each can find a good starting point.

Projects, social networks, and self-organizing communities could maintain portals, and perhaps even deploy their own crawlers, that record recommended documents and display the paths between them.

When document A adds a link to document B, the author of document B should receive a ping asking him or her to reciprocate. A link that both sides agree on is considered much more reliable than a link made by just one side. A lot of authors would like to link to popular documents, hoping to ride on its coattails. So if the author of document B says, "yes, make this link," it's considered to be highly reliable. If the author just fails to respond, the link should still receive some consideration but be rated as less reliable. And if the author says, "No, this link is not appropriate," crawlers should reject it.

Thus, the protocol has relevance ratings attached to each link, and these can be displayed by crawlers and the portals that include the resulting paths. In addition to relevance assigned by authors on both sides of the link, ratings can be influenced by direct user comments and by records of the number of times the links were followed. One could consider multiple scales for rating documents (such as quality, relevance, and degree to which they are up-to-date) but such plans could quickly become too complex for average readers.

Designing tools and protocols for generating documentation paths could be complex, but the payoffs are incalculable. The rowdy, disorganized web could reclaim one of the key advantages of a book--its logical organization. Readers would no longer be lost in hyperspace.

Furthermore, the system would encourage more documentation. Good authors could expect to get more readers, garnered through appropriate links from popular documents. Gaps in documentation would also be revealed, and aspiring authors could make themselves useful by focusing on places where the need is greatest.

Community members' efforts to inform themselves and their peers are vibrant and exciting. The energy and dedication of contributors is beyond doubt. But we are not making the most of their talents and efforts. The field has reached the point where it needs some formal tools and practices to move forward and be a true educational resource for the twenty-first century.


tags:   | comments: 9   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/6246

Comments: 9

  Matt Garland [01.31.08 08:02 PM]

Two compelling frontiers. Thanks, Andy. Here are some small user-centric add-ons.

1) The quizzes should be multiple choice, and each answer should prompt an explanation of why it is right or wrong. That way, the quizzes are more explicitly a review for the user, and increase her motivation to take them. (For the same reason, a user should be allowed to try every question-answer pair after the first, consequential answering.)

2) If a user had access to a literal graph of ALL possible paths, and a way for users to quickly sample content from that graph (like a Netflix rollover for each node/button), they could blaze their own path if a suggested path did not appeal to them.

Matt

  Kin Lane [01.31.08 08:38 PM]

Rating, cross referencing and META tagging of content is crucial for us to continue organizing this mess of information we have created.

Definitely needs to be a group effort to make sure we standardized our platforms to work together and work to assist the quality content to float to the top.

I think some of the leaders like wordpress are helping standardize platform offerings.

  Scott Gray [01.31.08 09:23 PM]

If you haven't read it, pick up a copy of "The Nurnberg Funnel: Designing Minimalist Instruction for Practical Computer Skill (Technical Communication, Multimedia, and Information Systems)" by John Carroll.

Some of the things you're suggesting sound like suggestions from that book.

According to Carroll virtually every study done on software documentation supports "minimal documentation" which suggests that users learning software don't like to read but instead prefer to plunge into the system. Users tend to look for information that matches what they are trying to accomplish.

Carroll suggests that we eliminate information that gets in the way of learning. Most of the time we include too much information in manuals and documentation because we don't know what the user needs to know or what they're trying to do.

Consider applying some of Carroll’s minimalist
principles as you build a design around what the users want to accomplish, how they will proceed, and how they will make errors and recover from them:
‚Ä¢ Allow users to get started fast— take an action centered
(or user-centered) approach by giving users
enough information to get their real tasks done right
away. Don’t try to cover every function; focus on
the users’ actions and not the products’ functions.
Get users engaged quickly by omitting long
introductions and cutting down on repetition and
verbiage.
‚Ä¢ Rely on users to think and improvise—provide
enough information so the users will explore on their own and discover solutions to specific problems.
‚Ä¢ Exploit what people already know—use metaphors
and similes to help users relate and learn.
‚Ä¢ Support error recognition and recovery. Errors can’t be avoided, but you can provide error information that supports error detection, diagnosis, and recovery.

Accordingly, it would be nice if the tools themselves could integrate with the type of documentation you're talking about. For instance if I'm using Eclipse and I'd like to be able to perform a particular task, I'd like to search from within eclipse and if I find an example someone has created be able to load it directly into eclipse. Or if I had an error an automatic search of such errors would be performed and likely solutions of the type you're discussing would be available to me.

  Scott Gray [01.31.08 09:39 PM]

The NICE thing about all this disorganized documentation is that you can almost always find someone who had the same problem you did and found a solution you can use.

Most documentation is written from the perspective that people learn from the documentation for learning sake...they don't...they have some particular problem they're trying to solve and want a solution as quickly as they can get it. First they just try to figure out themselves, then they go to the internet to try to find it. For now, that solution seems to be best found by using Google.

I don't think people looking for solutions will want to take the time to take a quiz to see if they can solve their problem. I think they'd rather just see if they can solve their problem by solving it.

  moya [01.31.08 10:50 PM]

really intriguing ideas; thanks andy.
regarding cross-referencing, what is the potential of social bookmarking as a plausible tool for sorting through online information? i've also been wondering about the potential of recent web2summit wunderkind Twine for social-networking/bookmarking/semantically scraping in order to "cross-reference" given topics along multiple axes.

  Jane Hadley [02.01.08 01:34 AM]

Great ideas here. But, as somebody who is mostly self-taught and uses online and offline documentation a lot, my opinion is that there is no substitute for excellent search, tables of content, and comprehensive, thorough indexes.

Mr. Carroll's hunch about people plunging into a project and looking for specific information applies to much that I do. When I go to documentation, I'm using search, tables of contents and/or indexes to quickly find what I want.

  Andy Oram [02.01.08 08:33 AM]

Thanks for the great comments. (And yes, I've read several articles by John Carroll, although I haven't read his book yet.)

Carroll inspired and influenced me, but he seemed to be testing mostly end-user systems with graphical interfaces (fairly novel for most people in the 80s) and I don't believe the "minimal" documentation you need for most systems is very small.

There are two ways to go with almost any hyperlink technology: ask users to make connections manually, or use a sophisticated tool, such
as Twine appears to be. Of course, we can do both. Twine can add choices and tweak ratings.

I know the history of the semantic web: asking users to add meta-information doesn't work as well as developing better search engines and other tools to do the work automatically. Twine looks like it makes good use of the work users do, and builds on it.

I think user recommendations are valuable for creating document paths. I tried to suggest a system that requires very little effort for a big positive impact. I'd like to see whether something like Twine can help too.

  bowerbird [02.03.08 12:56 AM]

sorry andy, but scott gray is directly on-target:
find out what they want to do, then show them how
and tell 'em how they shoulda asked the question.

-bowerbird

  laurie [02.03.08 12:22 PM]

The importance of cross-reference management is a big deal for many fields, and some of the existing organizational structures could be improved and then mapped to the appropriate cross-reference management system/methods if they were readily available. I work in a digital library center and we're creating tons of great digital content, but then we need to make that content usable.

The existing documentation for traditional libraries has been focused on opening the closed content, closed to physical access and within closed databases. Moving the traditional systems of organization (with so many subject guides, class guides, how-to guides, and other documentation types) into a form that works for both closed and open systems is difficult, as is figuring out how to make these modular and reusable since so many have always been developed in relation to the particularities of the physical library or the multiple closed databases. Reformatting the information on the subject area or field and then showing how the resources support the larger inquiry questions is a lot of work, and it would be ideal to have a great cross-referencing system to make this work usable for other libraries and for other fields since so much of the information could be useful for anyone. Then, the modular information could be used for traditional library and academic goals (research, class projects) and for explaining and organizing larger structures or research and data.

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS