Why Google Is Offering 411 Service

I haven’t seen a lot of people connecting the dots between Google’s recent announcement of 411 service and Microsoft’s acquisition of Tellme.

Now obviously, there’s one connection in that both are plays in local search, and it’s certainly true that providing 411 service is consistent with Google’s mission to provide “access to all the world’s information” and Microsoft’s desire to steal a march on them in voice-activated mobile search.

But it also seems to me that there’s a hidden story here about the speech recognition itself. I was talking recently to Eckart Walther of Yahoo!, who used to be at Tellme, and he pointed out that speech recognition took a huge leap in capability when automated speech recognition started being used for directory assistance. All of a sudden, there were millions of voices, millions of accents to train speech recognition systems on, and much less need for the individual user to train the system.

This is reminiscent of a comment that Peter Norvig, Director of Research at Google, made to me last year about automated translation, and why it’s getting better. “We don’t have better algorithms. We just have more data.”

In short, I’m speculating that the 1-800-GOOG-411 service is designed to harvest voice data to build Google’s own speech database, rather than licensing from Nuance or another player.

If I’m right about this, we see here another demonstration of my Web 2.0 principle that “data is the Intel Inside”, and that many of the future battles between industry giants will be around who owns data, rather than who controls software APIs. In that battle, we’ll see deployed all kinds of techniques to “harness collective intelligence” to build added value databases of various kinds.

One wouldn’t think that one of the side effects of a voice search application would be the creation of a competitive advantage in speech recognition, but I’ll lay odds that that’s part of what’s in play here.

Anyone who has confirming information on this speculation, please let us know.

tags:
  • http://www.webware.com Rafe

    Makes sense. So eventually Goog will be offering context-based ads based on streaming audio content?

  • Jake Lockley

    I thought is was because Yahoo! offers free 411 service and had been getting press about it lately.

  • Steve Clark

    Hello Tim, it has been a long time.

    I don’t have “direct” confirmation of your assertion, but I think you’re mentioning the low hanging fruits of databases. Ray Kurzweil has repeatedly mentioned the exteme power of collective data (see latest book), and this bit by Google follows the exponential pattern.

    In the voice arena, I don’t see it as a voice database, or “improved” speech recognition. It was *slow*, but I heard one lab’s version of a killer service = realtime, worldwide, universal translation.

    In search, it is the approaching useful AI assistant, which arrives in a seamless fashion with each new voice we encounter. Today’s SF Bay Area 511 traffic system is a mere newborn, maturing *much* faster than we are.

    We aren’t far from these now, making “Web 2.0″ services feel archaic. -Steve

  • http://blog.nextblitz.com gz

    Fascinating to think about, Tim, excellent point. Even the basic contextual data available from providing a 411 service can be gold for Google (especially paired w/ our online Google data) – but potentially build the speech database at the same time – which could be even more valuable…brilliant.

  • http://www.raduchel.com William J. Raduchel

    If true, this is a clever but old idea. At the New York Worlds Fair in 1964, IBM had an exhibit where you could get the New York Times headlines from your birthday. You had to print the birthday on an 80-column card, and IBM used those cards as its database for character recognition. Anytime you can collect analog input with its digital meaning you have value.

  • http://schestowitz.com Roy Schestowitz

    This could give a whole new meaning to the disclaimer “your call me be recorded for _training_ purposes.

  • James Lowe Jr

    After just a brief tour of your site I read thoroughly three of your articles and found them greatly informative. In reference to goog411 my understanding is that it is geared to find only business numbers not residential. This is fine with me because our yellow pages in the Philly area are far from all inclusive with business listings, it’s too expensive. I will completely give up the old Yellow pages after writing this and gaining confidence on my PC. Data, Data, Data that is where is’s at for the “big guys”. Thanks again.

  • Toby

    More (data) is better in speech recognition training, and Google is all about more data. Still it’s not clear that GOOG411 will provide anything that IBM/AT&T/Nuance haven’t had for an number of years.

    Generally one needs correctly transcribed speech data. This can be expensive to do manually, but workarounds involve closed captions and capture of successful dialogs.

    Individual calls will have greater variability, but 1000′s of hours of speech have been available for some time in the form of closed-caption’ed broadcast news; with newsreaders, correspondents and interviewees providing variation in voice and bandwidth (broadband speech vs telephone).

    Still you’ll recall the fiasco with Google’s acquisition of Kai Fu-Lee. Oh and this http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html – they’re quietly active in this area.

  • Ken Williams

    It makes a lot of sense, but remember that voice data over the telephone is pretty junky compared with voice data in other arenas. Telephone signals cut out a lot of the high & low frequency bands that can be very useful in speech recognition (by humans or otherwise). So this low-quality data could be useful for bootstrapping themselves into the recognition arena, but the real fruits will come if they start offering a net-connected voice recognition service as part of Google Desktop. =)

  • http://www.adoredbyhordes.blogspot.com MarcLord

    Ken Williams,

    Telephone data is the most valuable precisely because it is low quality, and it is most similar to the environments in which the vast majority of human speech occurs–those with relatively poor SNR and variably high ambient noise levels. Collecting data through phone searches also gives you massive amounts of those data strings which reach into the longest, even unique, parts of the data incidence tail. Having to recognize that kind of data forces you to hone in on the eigen-values which best handle it. Then if you correct every single error state and feed it back into your model, you will achieve something approaching perfection.

    So the junkiness of phone date compared to voice data in other areas is highly relevant. You want lots of dirty data, and only a little clean. It’s the obvious way to build a proficient speaker-independent recognizer.

  • Stephanie Ryan

    ONEV is now implementing VR DA for business and residential in Mexico, for Carlos Slim’s Telnor, which, of course, is in Spanish.

  • karl

    onevoice technology for carlos slim and for united states also.

  • lafata

    Google has been talking to Cognimatics about face recognition software, too.

  • Munira Majmundar

    I monitor, gather and analyze business intelligence for a major corporation. And, your analysis of why Google introduced 411 seems quite accurate. May have read the following article on Nuance…Here is my analysis on it…Nuance Unveils Nuance Voice Search, Breakthrough Solutions for Automated Directory Assistance
    Nuance’s Voice Search (NVS) gives callers faster and easier access to the information they need. This, in turn, allows carrier to automate their calls and reduce costs. NVS introduces newly created capabilities for automated DA with interpretative capabilities and new applications for BCS and ads-supported search. NVS delivers a more natural and interactive experience for callers. Unlike traditional systems, that fails when they cannot match a callerís request to an exact listing from the database, NVS can interpret a search request irrespective of ambiguity or incomplete information. NVS is crafted specifically to the task of applying speech to DA requests. Its revolutionary features include ‘Unsupervised Learning’ and ‘Dynamic Disambiguation’.
    ‘Unsupervised Learning’ feature uses artificial intelligence to enable the system to continuously and automatically learn from manually routed calls by an operator and refine voice recognition. For example, if Joeís BBQ Shack is frequently requested by callers as ìJoeís Barbecue,î then as operators manually direct requests for Joeís Barbecue to the Joeís BBQ Shack listing, NVS will recognize that trend and begin to automate that routing rule. Similarly, ‘Dynamic Disambiguation’ feature reduces the number of qualifying questions asked of callers before getting a desired listing. For example, if a caller says: ìUh, the Safeway please,î traditional systems will typically respond with ìIím sorry I didnít understand that, just say the listing name,î then the system will ask ìon what street?î and finally it might ask ìfor what department?î. By contrast, NVS reduces the steps from three to one by simply saying: ìFor Safeway, I found the following listings ñ Safeway Corporate headquarters, Safeway market on St. Josephs Street, and Safeway customer service on Stoneridge Ave. Which do you prefer?î The result is a very positive caller experience, quickly connecting callers to the information they need, in a fully automated way.
    Implications:
    Proliferation of free 411 service providers is a result of their ability to rely on sophisticated automation. Automation reduces costs while simultaneously delivering almost the same quality of service, if not better, as a live operator. As noted by Matt Booth of the Kelsey Group, by hooking the automated service into ads-supported local business information, companies like Google could be able to slash the costs of providing DA to around 2 cents per call, while generating around 10 cents for each business referral. Besides offering automated voice search, NVS also offers BCS and free DA.
    http://www.nuance.com/news/pressreleases/2007/20070410_voicesearch.asp

  • diego

    “Calls recorded for quality”. There’s your data collection. It would actually be ridiculous for them to throw away all that nice data.

  • joe mac

    Do you think that google is recording the actual conversations themselves?
    It never indicates that that part of the conversation is being recorded for quality as well. Perhaps as a way to exponentially increase the hours of data per call as well as monitor keywords in conversations (for national security).

    If they are connecting you through thier lines and they inform you the call is recorded – there’s no way to indicate when they stop recording. Could be an interesting way to beta test a national security citizen-terrorist monitoring system.

    1984 – it’s coming.

  • http://www.urbanmvp.com J.O. Urban

    Interesting theory Tim. Wouldn’t such actions by Google open it up to all sorts of legal actions if this data in the future can actually end up creating profits for Google? I mean contextual and behavioural data gathering on an online user is different. But when it comes to data such a persons voice/accent/tone which are much more personal I can see a large amount of people taking issue with this type of use without financial compensation.

  • delongdrive

    It would make sense that Google’s using its 411 service as a front to funnel voice data, rather than provide a long-term service. They haven’t even found ads to pay for overhead or provide human tech support (and folks will need it if they’re still working the kinks out of the voice recognition), which services like 800-free-411 offered years ago. Users will get mad, maybe even “iPhone discount” mad if google decides to just pull the plug once it’s done recording everyone’s conversations :P

  • http://www.radiocamp.com Gregg McVicar

    Here’s a new Google service I would like to see out of this: online audio transcription. Drag & Drop an audio file and it would be compressed, uploaded, transcribed and emailed back to you (no doubt with Google Ads based on the content).

    There could be a free version, pay-for-privacy versions and eventually all manner of translation options. The value of being able to do this from virtually anywhere, in faster-than-real-time, would be phenomenal.

    I’m just saying, this would be cool. Hint, hint.

  • Smally

    I think you are all crazy. The human brain will always be better at anything that even remotely involves interpretation (thinking) than even the best (fastest) computer with the most data.

    A hundred or so people, properly tained on how to use a database to retrieve data could rund circles around any automated 411 system.

    Just one of the reasons that Southwest airlines still answers the phone. It is much faster and cheaper for them AND their customers.

  • http://tim.oreilly.com Tim O'Reilly

    Smally –

    I don’t disagree. I would always prefer to talk to a human operator. And a human operator partnering with a machine (because that’s what you’re talking about too — not an unaided human operator) will outperform the machine only, at least for quite some time to come. But that wasn’t the point of the post. The point was that Google was offering 411 services in order to train their speech recognition algorithms. Whether or not a human would be better is irrelevant. Imagine having enough operators to answer all google queries by phone? It isn’t possible. One day soon, it may be possible to do a decent job via speech recognition.

  • Noname

    You are right.

  • http://bit.ly/rsvp Adriano

    Back in 2007, Marissa Mayer, Google VP said: “Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search. The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video, we can do it with high accuracy.” Source: http://blogoscoped.com/archive/2007-12-17-n30.html

    Tim, so your speculation is on target. I suspect that Google Voice currently shares the methods to transcribe voice mails into text/emails.

    Re: more data v. better algorithms, see the recent video lecture by Peter Norvig :: Statistical Learning as the Ultimate Agile Development Tool: http://ff.im/aTcNn

    Norvig is also co-author with others at Google Research on this interesting paper: The Unreasonable Effectiveness of Data :: IEEE Intelligent Systems (March/April 2009), http://ff.im/3lxk7