Augmented reality and the ultimate user manual

Most user manuals are worthless. They’re chock full of poorly written text and confusing diagrams. Worse still, the gap between problem and solution is vast because we’re forced to apply a linear format (a guide) to a specific question. Where’s a search box when you need it?

But here’s an idea: What if instead of leafing through pages or scrolling through an online manual, you could simply see your way through a task? Just slide on a headset and work your way through a bit of customized, augmented-reality education.

That’s what Columbia University computer science professor Steve Feiner and Ph.D. candidate Steve Henderson are trying to do with their Augmented Reality for Maintenance and Repair (ARMAR) project. They’re combining sensors, head-worn displays, and instruction to address the military’s maintenance needs. Take a look at this project video and you’ll quickly see how the same application could extend to all sorts of use cases:

In the following Q&A, Feiner and Henderson discuss the genesis of ARMAR and its practical applications. They also offer a few tips for anyone who wants to develop their own AR-based instructional project.

Mac Slocum: What inspired ARMAR?

Steve Feiner: ARMAR was inspired in part by earlier research projects that we have done in Columbia’s Computer Graphics and User Interfaces Lab, investigating how augmented reality could be used for maintenance and assembly tasks.

This work dates back to 1991, when we began work on KARMA (Knowledge-Based Augmented Reality for Maintenance Assistance). The earliest work on ARMAR itself began in 2006, with initial funding from the U.S. Air Force Research Lab, when Steve Henderson began his Ph.D. studies at Columbia.

Our application domain of the LAV-25 light armored vehicle turret was the result of funding from the U.S. Marine Corps Logistics Base, beginning in 2007, to investigate how AR might be applied to future field maintenance of military vehicles.

MS: Is ARMAR in active use?

Steve Feiner: ARMAR is a research project and has not been deployed.

MS: Can you walk me through the ARMAR user experience?

Steve Henderson: The user can see five kinds of augmented content presented on the see-through head-worn display:

Attention-directing information in the form of 3D and 2D arrows, explaining the location of the next task to perform.
Text instructions describing the task and accompanying notes and warnings.
Registered labels showing the location of each target component and surrounding context.
A close-up view depicting a 3D virtual scene centered on the target at close range and rendered on a 2D screen-fixed panel.
3D models of tools (e.g. a screwdriver) and task domain components (e.g. fasteners or larger components), if applicable, registered at their current or target locations in the environment.

MS: What tools and technologies does it employ?

ARMAR being used by a Marine Steve Henderson: The initial implementation of ARMAR was built as a game engine mod using the Valve Source Software Development Kit. Over the past semester, ARMAR has been reimplemented using Goblin XNA, our lab’s open-source platform for developing augmented reality applications.

Steve Feiner: We also take advantage of a wide range of head-worn displays and tracking systems available in Columbia’s Computer Graphics and User Interfaces Lab. These include a custom video see-through head-worn display that Steve Henderson built specifically for use in the project (using a Headplay display and two Point Grey Firefly MV cameras), a Vuzix iWear VR920 with CamAR video see-through head-worn display, and an NVIS nVisor ST 60 optical see-through head-worn display. The tracking technologies that we use include InterSense
IS900 and IS1200
hybrid trackers, NaturalPoint
OptiTrack IR optical tracking, and the VTT ALVAR optical marker tracking package.

We typically run the application and head-worn display on a desktop PC with an NVIDIA Quadro FX 4500 graphics card. When applicable, we run the NaturalPoint OptiTrack on a separate laptop. But, there’s no reason why the application itself couldn’t run on a high-end laptop.

In addition, there are now wireless HDMI solutions that could be used to effectively cut the cable from the computer to the head-worn display, eliminating the physical connection to the computers.

ARMAR is a research testbed, and not a ready-to-deploy production system. Therefore, we are free to explore different combinations of technologies, without having to commit to them as part of a turnkey solution.

MS: The video shows what appears to be the G1 mobile phone. Is that an input device?

Steve Henderson: The Android G1 phone is used as a wrist-worn controller that displays a simple set of 2D controls and detects user gestures made on the touch screen. Gestures are streamed to the computer running ARMAR through Wi-Fi. The G1 allows the user to move between maintenance steps, and control the explanatory animations that the system can present — starting and stopping them, and changing the speed at which they play.

MS: How small can you make ARMAR?

Steve Feiner: Our emphasis has been on developing a research testbed in which we can design and formally evaluate the effectiveness of new ways to assist mechanics in learning and performing maintenance tasks. Therefore, we haven’t had to worry about choosing specific hardware on which a production-quality implementation could be fielded right now, let alone making it really small.

That said, Moore’s Law, in concert with competitive hardware development and strong consumer demand for ever smaller and more powerful devices that can support 3D games, is driving down the size and cost of the mobile devices on which ARMAR and its descendants will be able to run. And, the capability for transmitting wireless high-resolution video could also help eliminate the need for cables to/from the head-worn display, eventually allowing the system to use eyewear that looks much like current glasses. These could be connected wirelessly to a small smartphone-sized waist-worn computer, or even to a nearby stationary computer whose size then becomes much less important.

MS: Could something like ARMAR be ported to mobile phones? Could it exist as an app?

Steve Henderson: Yes. But, note that an app that used a current mobile phone’s built-in camera and display, held in the user’s hand, won’t accommodate many tasks in which the maintainer needs to devote both hands to the task itself. As mobile phones mature, however, we believe they will soon be designed to interface with — or even be built into — tracked eyewear, making them an ideal platform for ARMAR.

MS: What’s been the most challenging aspect of development?

Steve Henderson: It’s been challenging to track the user’s head within the cramped confines of the turret. We do not have a full replica of the turret in our lab, and were not able to permanently install any tracking infrastructure in the actual turrets where we did our studies.

Using stereo video see-through head-worn displays under Direct3D has also been challenging. There are no explicit provisions for stereo in Direct3D and the formal support for stereo displays provided by graphics card vendors does not address merging rendered graphics with separate left-eye and right-eye video. We were lucky to have NVIDIA provide us with an unsupported software development kit for handling this on their graphics cards.

MS: Has anything gone smoother than you anticipated?

Steve Henderson: Our recent reimplementation of ARMAR using the GoblinXNA framework has gone very smoothly. Our initial prototype design, which leveraged the Valve Source software development kit, required custom implementations of several core functions required for augmented reality applications (e.g., tracking and camera control). GoblinXNA provides these functions implicitly, which has allowed us to spend more time on the design of the actual augmented reality interface. Additionally, implementation of the wrist-worn controller was very straight forward using the Android Software Development Kit and Eclipse Integrated Development Environment.

MS: Do you see applications in other industries?

Steve Feiner: There are many potential applications of AR to explaining industrial tasks, in both training and production. Essentially, it could be used in any domain in which personnel use conventional documentation, ranging from paper manuals to computer-based electronic manuals.

MS: How about consumer use?

Steve Henderson: There are many day-to-day tasks in which consumers currently need to consult written or computer-based instructions. Think of assembling a bicycle or a piece of furniture, making a complex recipe, wiring a home entertainment center, or fixing a balky lawnmower. These are just some examples of tasks in which systems like ARMAR could make the task easier and faster to perform, and make it more likely that it’s performed correctly.

MS: If someone wants to pursue a similar project, what guidance would you give them? What should they watch out for? Where should they start?

Steve Feiner: It’s important to be aware of, learn from, and build on relevant ongoing and past work. Researchers have been exploring AR and publishing their work for over 40 years, beginning with Ivan Sutherland’s research on head-tracked see-through head-worn displays.

The leading conference in this field — the IEEE International Symposium on Mixed and Augmented Reality, and its direct predecessors — dates back to 1998. So, we would strongly recommend that someone who wanted to develop a similar project (or, for that matter, any AR project) become familiar with what others have done before, to find out what worked and what didn’t.

It’s also important to have a close working relationship with subject-matter experts in the field in which the application will be developed and to be able to run user tests with the members of the population for whom the system is being designed.

MS: What’s the next step in making this technology more widely available?

Steve Feiner: In the work we reported on at IEEE ISMAR 2009, we showed how AR made it possible to locate maintenance tasks to perform more quickly than state-of-the-art electronic documentation. And, we’re now concentrating on improving users’ speed and accuracy in performing tasks that involve orienting and positioning parts during assembly and disassembly. Making the technologies on which we’re working available to others will involve additional funding to address other domains and to make robust production implementations of the software.

Augmented reality and the ultimate user manual

The ARMAR project shows how augmented reality can revolutionize learning