Voice in Google Mobile App: A Tipping Point for the Web?

As I wrote in Daddy, Where’s Your Phone?, it’s time to start thinking of the phone as a first class device for accessing web services, not as a way of repurposing content or applications originally designed to be accessed on a keyboard and big screen. The release of speech recognition in Google Mobile App for iPhone continues the process begun with the iPhone itself, of building a new, phone-native way of delivering computing services. Here are two of the key elements:

Sensor-based interfaces. Apple wowed us with iPhone touch screen, but the inclusion of the accelerometer was almost as important, and now Google has shown us how it can be used as a key component of an application user interface. Put the phone to your ear, and the application starts listening, triggered by the natural gesture rather than by an artificial tap or click. Yes, the accelerometer has been used in games like tilt, parlor amusements like the iPint, but Google has pushed things further by integrating it into a kind of workflow with the phone’s main sensor, the microphone.
This is the future of mobile: to invent interfaces that throw away the assumptions of the previous generation. Point and click was a breakthrough for PCs, but it’s a trap for mobile interface design. Right now, the iPhone (and other similar smartphones) have an array of sensors: the microphone, the camera, the touchscreen, the accelerometer, the location sensor (GPS or cell triangulation), and yes, on many, the keyboard and pointing device. Future applications will surprise us by using them in new ways, and in new combinations; future devices will provide richer and richer arrays of senses (yes, senses, not just sensors) for paying attention to what we want.

Could a phone recognize the gesture of raising the camera up and then holding it steady to launch the camera application? Could we talk to the phone to adjust camera settings? (There’s a constrained language around lighting and speed and focus that should be easy to recognize.) Could a phone recognize the motion of a car and switch automatically to voice dialing? And of course, there are all the Wii-like interactions with other devices that are possible when we think of the phone as a controller. Sensor based workflows are the future of UI design.
Cloud integration. It’s easy to forget that the speech recognition isn’t happening on your phone. It’s happening on Google’s servers. It’s Google’s vast database of speech data that makes the speech recognition work so well. It would be hard to pack all that into a local device.
And that of course is the future of mobile as well. A mobile phone is inherently a connected device with local memory and processing. But it’s time we realized that the local compute power is a fraction of what’s available in the cloud. Web applications take this for granted — for example, when we request a map tile for our phone — but it’s surprising how many native applications settle themselves comfortably in their silos. (Consider my long-ago complaint that the phone address book cries out to be a connected application powered by my phone company’s call-history database, annotated by data harvested from my online social networking applications as well as other online sources.)

Put these two trends together, and we can imagine the future of mobile: a sensor-rich device with applications that use those sensors both to feed and interact with cloud services. The location sensor knows you’re here so you don’t need to tell the map server where to start; the microphone knows the sound of your voice, so it unlocks your private data in the cloud; the camera images an object or a person, sends it to a remote application that recognizes it, and retrieves relevant data. All of these things already exist in scattered applications, but eventually, they will be the new normal.

This is an incredibly exciting time in mobile application design. There are breakthroughs waiting to happen. Voice and gesture recognition in the Google Mobile App is just the beginning.