Previous  |  Next


Dec 10

Tim O'Reilly

Tim O'Reilly

The Future of Cell Phone Headsets

There are some interesting speculations in an O'Reilly Network article by Peter Drescher entitled The Annoying Future of Cell Phone Headsets. The predictions start about halfway down page three of the article, and focus on the rise of stereo headsets for phones (as in the iPhone):

Until recently, talking on the phone was, without exception, a monaural experience. Even now, I almost always pull out one earbud out when I'm on a call. But the case of "listening to music, then the phone rings" is so common you quickly get used to the schizophrenic feeling of the voice in your head. In fact, it can even make you feel more connected to your caller, and facilitate communications in high-noise environments, like, say, every street-corner call you've ever made.

Stereo headphones create an audio barrier around your head. The world goes silent (or at least gets a lot quieter), and you navigate through the environment with your own soundtrack. But with stereo headsets, people who have your phone number can now pierce that barrier and join you inside it (and in the exact center of it). If your caller is also wearing a stereo headset, it's as if your bubbles are connected.... You're inside of their head, and they're inside of yours.

The article goes on to suggest some of the new social behavior (and supporting applications) that will start to take hold when stereo bluetooth headsets are the norm:

  • Better sounding phone calls. "There's no reason why the headset can't produce full-resolution voice audio, since it's already doing it for music playback."

  • Better conference calling. "In a mobile broadband world, you could receive multiple streams of conferenced calls and position them in the stereo field for increased intelligibility."

  • Sharing audio, with conversation. "Imagine if I could authorize your headset to pick up my phone's audio signal, then we could both listen to what my phone was playing.... These headsets have built-in microphones, so there's no reason why you couldn't mix your voice into the shared music stream. Then I can talk to you, you can talk to me, and we can both still hear the music."

  • Sharing game sound. "Speaking of 3D audio, let's use that feature in a mobile Star Wars game to send those damn Imperial TIE-fighters buzzing around your head like flies, giving you more reason to swat them out of the sky. Then you can switch to multiplayer mode and contact the rest of your squadron. Now you're bantering via voice data network with Red Leader on your left and Red 5 on your right, all while blasting spaceship formations in coordinated attacks."

The article concludes with a compelling vision of a likely future:

I'm looking at wireless stereo headsets, and thinking that as they become more comfortable, more useful, more powerful, more commonplace, and more stylish, there will be fewer and fewer reasons to ever take them off. Eventually, you'll just stick them in your ears and forget about 'em. They will become like acoustic contact lenses, or a heads-up display for your ears. They'll let you access and control a virtual audio reality that streams in from wireless networks all around you and is mixed with voice data from your phone and from everybody's phone. And although the ubiquitous audio network I'm describing does not yet exist, you can actually listen to what it might sound like today.

It's completely analogous to being in a recording studio, isolated by big headphones, auditioning multiple tracks, and talking to the control room via live mic. I remember my first time in a real studio: I put on the cans and was astounded by the sense of space, the detailed audio field, and the sound of my own voice — in my head, through the mixing board. Now imagine that feeling as a mobile experience, but instead of talking to the engineer on the other side of the glass, you're walking down Broadway, talking to someone on the other side of the world.

I'm sure that at first, when only a few people are living in the mobile "heads up" auditory network, they will be quite "annoying" in public spaces, but eventually, I imagine we'll figure out how to deal with that. There's a lot that's compelling in this vision. I've always imagined heads-up visual displays being one of the harbingers of the era of wearable computing, but Peter makes a pretty compelling case that it's in audio that we're going to see the first signs of ubiquitous wearable computing.

tags: audio, future, nff, wearable  | comments: 18   | Sphere It


0 TrackBacks

TrackBack URL for this entry:

Ross Stapleton-Gray   [12.10.07 08:04 AM]

Two points to note: as of July 1, California residents will be required by law to use hands-free technology when using cell phones while driving, so I'd expect a big jump in the pervasiveness of same (it'll be the reason I get it, if then); and as yet another promiscuous wireless technology, Bluetooth adds to our detectibility/trackability. We're creating little wireless auras that will be as much a part of our outward appearance as our looks and sounds.

Steve Ganly   [12.10.07 08:56 AM]

Multi-party conference calls will be interesting with stereo headset - one person front left, another rear right, etc.

And you could have a lot of fun with answerphone messages (plane does a flyby overhead...)

Mark Blafkin   [12.10.07 09:10 AM]

"Now imagine that feeling as a mobile experience, but instead of talking to the engineer on the other side of the glass, you're walking down Broadway, talking to someone on the other side of the world."

...right before you are liquified by the truck driver who is equally immersed in his conversation with his wife back in Newark.

...right before the mugger slides up behind you and sticks a gun in your ribs.

This is a very interesting case for wearable audio computing, but the 'annoyance factor' will be the least problematic issue created in this vision of immersive audio environments.

Another issue to consider:

At what point do public places stop being public. If everyone in Times Square is mentally somewhere it still a public place?

If you want to be that immersed in another conversation, place, etc... why are you bothering going into the public?

If the

John Baxter   [12.10.07 10:00 AM]

One reason to take off the stereo headsets: in many states it is illegal to drive with both ears covered. Many of us don't want to spend the time and money needed to prevail in a leading case establishing that earbuds don't "cover" the ears.

Brian Aker   [12.10.07 10:22 AM]


Speaking as someone who has hearing loss, I have been talking about this for years. I have a big problem with ambient sounds in my environment. When I really want to listen to a call I have a stereo bluetooth set from Motorola that I use to listen to calls. I just cannot hear anything with the average cell phone (and this was one of the reasons I avoided having one for so many years).

My big wish? That the set had noise cancellation as well as stereo. This would be a godsend for me and phone calls.

On a similar note I found myself walking around Tokyo last week doing a video phone call. I was amazed that I was able to walk in a crowd and have a conversation at the same time. The more amazing part was that I didn't get hit by a bicycle while doing it.


Future Technologies Converged   [12.10.07 10:27 AM]

A lot of what has been described is already among us with a combination of sophisticated mobile phones, services such as skype and of course Bluetooth. The audio environment is of course a useful addition in wearable computing and will certainly have benefit.

However, I still believe it's the visual spectrum that will provide us with most results because of two important reasons:

1. It is much more efficient to extract information from an image than listening to an audio stream
2. It is much easier to create an image which contains information than automatically creating that audio stream

This basically leaves out those audio streams that are from other people which is what a mobile call is all about anyway and we already have it.

The enhanced audio experience will be similar to 3D sound enhancement we get in a shoot-em-up game. Certainly it will make the illusion more realistic, though it may not have a huge impact on game play itself. It doesn't sell the game. It's almost the same for mobiles. The experience will be enhanced but it will be more like "ear-candy". While on the other hand, image-based wearable computing, augmented reality and virtual reality will have a much bigger effect on our mobile behaviour and sense of digital immersion in the real world.

So, 3D audio is a welcome addition to our mobiles, though there is clearly many other high-impact features to come that 3D sound could be dwarfed by.
Future Converged

Ross Stapleton-Gray   [12.10.07 10:45 AM]

I wonder when someone will cook up the 21st century equivalent of the phone booth, to ensure a confidential conversation (bridge the booth to the carriers/shield it against other RF emissions), enhance the experience (sexy visual avatars to represent your partners in conversation), ambient mood music (cued based on who's calling, plus conversational keywords), etc...

Tim O'Reilly   [12.10.07 11:03 AM]

Working backwards through these comments:

FutureConverged: I didn't say that audio improvements would have the most impact, just that they were the *first* harbinger of more immersive mobile environments.

Mark Blafkin: and this is different in what way from all the people who are already shut off from the world in a private music reverie?

In that regard, see the recent Doonesbury strip:

It seems to me that a fuller-featured audio environment that was designed for permeability (whether from outside calls or user intervention) might actually be slightly more "heads up." But there's no question that there's some serious uncharted territory here, and that there will be a lot of Darwin awards given out in the process of figuring out both social norms and practical risks.

Brian: Noise cancellation would be awesome. But the point of the article is straightforward: we're at a tipping point where headphones and their capabilities are about to get a lot better. So I imagine that will be in the cards.

David Battino   [12.10.07 11:55 AM]

1. It is much more efficient to extract information from an image than listening to an audio stream.

Only in some cases, like when you can give your full concentration to decoding the image. Paul Lehrman notes in this month's Mix magazine,

By many measures, our hearing is more acute than our sight: The range of our aural perception (at least when we're young) covers 10 octaves, while our response to visible light barely covers one octave. The dynamic range of the human ear is about 120dB, and we can go from the lowest extreme of that range to the highest pretty much instantaneously; the dynamic range of the ocular system, taking into account both the physical and chemical changes the eye undergoes to adjust to varying light conditions — some of which can take as long as several minutes — is a mere 60dB.

I believe we're so conditioned to abysmal telephone sound quality that we don't realize what could be possible. Peter Drescher adds a pointed comment in the discussion section after the article, regarding fear of virtual reality audio headsets:

Your objection merely strengthens my conviction that this technology will happen, because few things are more attractive to teenagers than being incomprehensible to adults.

David Battino
Audio Editor
O'Reilly Digital Media

Ross Stapleton-Gray   [12.10.07 04:02 PM]

Actually, I wonder if audio isn't the thing that we'll see available as cybernetic implants first... embedding devices to deliver sound into the ear via the skull/ear tissue would be a lot less invasive/risky than anything that touches the brain.

Silicon Valley   [12.10.07 09:49 PM]

Perhaps in the next decade this technology will be integrated with video - so that an eyeglass like gadget will allow for a 3-D movie or web browsing experience - syncing with the surround sound audio of the headset.

Future Technologies Converged   [12.11.07 02:31 AM]

To Tim O'Reilly: Thanks for clarifying. I agree with your point, though I still think it could quickly get overshadowed by visual enhancements.

To David Battino:
When it comes to comparing visual with audio content, you should consider an extremely important point: an audio stream is sequenced over time. You always need time to process it. There is no such thing as audio that can be freezed in time. In contrast, an image is processed in parallel and we usually get many cues at once from it.

The comparison between our much larger visual cortex than auditory cortex also indicates that we do a lot more processing on the image and we are much more capable do to that kind of processing.

Try navigating the world with only sensing audio (as blind people do, which are much more trained in it anyway) and the experience doesn't live up to what you get with your eyes.

Hence, we may have ultra-sensitive ears (according to your quote) but that doesn't mean we have the capability to process it. Our visual perception seems to have evolved much faster (with good reasons) and we get more out of our eyes than we do out our ears. So I am afraid, I will have to disagree. Our visual system is our best sensor.

David Battino   [12.11.07 10:46 AM]


An audio stream is sequenced over time. You always need time to process it. In contrast, an image is processed in parallel and we usually get many cues at once from it.

Interesting points, but I'd argue that hearing is more parallel than sight. Consider the case of a security guard or stock trader tasked with monitoring a bank of video monitors: It quickly becomes overwhelming.

In fact, I've read that stock traders now use a system that assigns sounds to market events. If a particular stock starts heading up, the computer might play an ascending marimba arpeggio. In that way, traders can monitor multiple feeds simultaneously. And the sounds can be quite short; it's not a case of waiting for a text-to-speech reader.

I believe the brain places the highest priority on recognizing change in the environment. There are many examples of this in the natural world.

Of course, arguing whether sight or hearing is superior is silly in the end; what will be really exciting is when we can deliver equally realistic presentations in both media and have them synchronized. The best videogame graphics are still flat and cartoonish, whereas good "3D" audio, at least on headphones, can send a shiver down your spine.

Andrew Solodar   [12.11.07 02:28 PM]

Much of what tim is talking about is currently being done in my company. Feel free to check out the demo at to listen to the 4D audio experience which Tim is right on target about. Feel free to contact me if you have any questions.


Future Technologies Converged   [12.11.07 10:53 PM]

To David Battino

"The best videogame graphics are still flat and cartoonish, whereas good "3D" audio, at least on headphones, can send a shiver down your spine"

Well, the current 3D graphics are now getting really amazing together with 3D sound. I agree the combination is really nice.

Ubiquitous computing is certainly going to become more mainstream and that includes all our sensors: a glowing light, a changing colour and of course sound. Change is certainly easiest to detect, and this is also applicable to our eyes.

Take your stock trader example. He can benefit from sound reflecting a stock price. But imagine if he wants to use "only" sound to do his job. It will be extremely difficult. He benefits a lot from multiple monitors and the huge combination of possible output.

I agree that 3D sound is a nice addition, but it will not be the all encompassing. Is it the next major feature to be added to mobile phones? probably and that's always better than what we have now.

Eric Meyer   [12.12.07 09:57 AM]

Sounds (ah HA ha) like a technologically intermediated telepathy to me. I wonder at what point the distinction between the "supernatural" and technological versions ceases to be meaningful.

Ross Stapleton-Gray   [12.13.07 12:58 PM]

Any sufficiently marketable paranormality is indistinguishable from technology.

I like the idea of layering extra sensory (as opposed to extrasensory... for that we'd need a workable brain interface, so that one could "just know" facts) input, e.g., ambient music that shifts based on observable data (you're driving, enter a high-crime area, and the music bridges to "Peter Gunn," say) or lighting raising or lowering based on seriousness of conversation.

David Battino   [12.23.07 10:01 PM]

Peter Drescher just posted a follow-up article describing the potential hardware in this virtual reality headset in more detail. He includes more application scenarios as well. For example:

Rewind: Since you've got stereo mic input and gigabytes of storage, how about a rolling 5 minute (or 5 hour) audio buffer ... a continuous "court stenographer" that lets you play back anything you've ever heard. Someone tells you a great joke -- voice command "Save buffer" stores it in a date, time, and geo-tagged file for later retrieval (and sharing).

Post A Comment:

 (please be patient, comments may take awhile to post)

Remember Me?

Subscribe to this Site

Radar RSS feed