See It, Follow It

Before our eyes and minds can “follow” something in our environment, we first must detect it. Similarly, before an AR application can “augment” something, the target object or place needs to be detected. Strictly speaking our eyes can’t detect a geo-location, but a GPS-enabled device can detect where it is relative to other points on the globe.

Since most of the world’s people, objects and places are not emitting radio signals which our mobile Internet devices can reliably detect, as was once envisioned in the early visions of RFID, other technologies are being used and new ones being developed for detection in AR applications. Further, even if there were tags on us (or other moving objects) and readers everywhere, RFID alone is insufficient to provide the six degrees of freedom necessary to correctly position a device relative to the object or point of interest. This isn’t to say that RFID has no place at all in AR, just that it is not a widely applicable tool for developers of today’s consumer AR applications.

Tracking for AR applications involves identification of one or more targets in the user’s field of vision or surroundings, then keeping track of the position of the user’s device relative to the recognized and/or selected object in three-dimensional space, and, for there to be an augmentation in the field of view, properly “registering” an overlay image or text to the real world object. The first two of these steps are closely aligned with the sequence which some types of robots need to perform when moving autonomously in an environment. They are also leveraging core ubiquitous computing technologies which are necessary in “intelligent environments,” as in spaces which exhibit Ambient Intelligence.

Tracking real world objects which are stationary (with fixed geo-location coordinates) has been achieved most widely and at relatively low cost using a mobile phone’s GPS and compass. There are many examples such as Wikitude, Layar and BionicEye. But there are situations in which GPS and compass are not the best, for example when the user is inside a building or near something which causes disturbances in the magnetic field and, in the best of circumstances, GPS and compass technology don’t provide the speed and accuracy which many AR applications require.

Let’s just take, for example, applications in which the user’s target object is not fixed in space. This challenge has been solved for years by affixing a marker, such as a QR (Quick Response) or Data matrix code on the object. For the past three or four years, markers have provided a suitable approximation for what most people designing AR applications really want: recognition of people, objects or places on the basis of their unique features, or, in the research community vernacular, “natural feature recognition.”

Not surprisingly, tracking with feature recognition technology is currently receiving a great deal of attention in the scientific community and will be a field of future research among user experience designers, computer vision experts and a variety of other domains for some time to come. Scientists from world-renown centers of research on tracking are going to be presenting a few of their most recent achievements at the annual conference of the Mixed and Augmented Reality research community later this month in Orlando.

In his paper, to be published in the conference proceedings, Vincent Lepetit, a researcher at the Computer Vision Lab at EPFL will be describing how his group has extended the ESM (Efficient Second-order approximation Method) algorithm developed by researchers at INRIA in order to compensate for motion blur when tracking a 3D object. Another achievement of the CVLab in Lausanne is the ability to identify (to detect) a 3D object’s features without computationally and time-intensive prior training.

Also in attendance at ISMAR, the Institute for Computer Graphics and Vision at the Technical University of Graz (Austria) has a group of researchers focusing specifically on algorithms for natural feature recognition of objects for AR applications on mobile phones and has broken a number of significant barriers in this field, notably the Studierstube Tracker, a library for 2D code detection on the mobile handset for AR applications. Currently in its second phase, the Handheld AR project is leveraging past work at TU Graz and in other labs, in the field of natural feature recognition.

In their ISMAR 2009 paper, Daniel Wagner, Dieter Schmalstieg and Horst Bischof of TU Graz describe a new technique for very high accuracy and real-time pose estimation and tracking on mobile phones. Another of the many interesting projects on which the institute’s faculty and students are working is the development of technology permitting utility workers to use their mobile handsets for “X-ray vision” of underground structures while in the field.

RFID remains an interesting option to supplement other tracking technologies for indoor applications and situations which are relatively tightly controlled (e.g., teaching/training, museums, entertainment venues, architecture and urban planning). When RFID readers are more ubiquitous, as common as GPS and compass, for example, the technology could play a role in future AR applications to narrow down the field of search and reduce the time required for a positive identification, for example. Tracking for consumer AR applications in uncontrolled environments when all the user has is a camera phone remains a very, very challenging area of research and we should expect to continue seeing major developments in this field in the year ahead before it is gradually integrated into our everyday AR applications.