Watson, Turing, and extreme machine learning

The real value of the Watson supercomputer will come from what it inspires.

One of best presentations at IBM’s recent Blogger Day was given by David Ferrucci, the leader of the Watson team, the group that developed the supercomputer that recently appeared as a contestant on Jeopardy.

To many people, the Turing test is the gold standard of artificial intelligence. Put briefly, the idea is that if you can’t tell whether you’re interacting with a computer or a human, a computer has passed the test.

But it’s easy to forget how subtle this criterion is. Turing proposes changing the question from “Can machines think?” to the operational criterion, “Can we distinguish between a human and a machine?” But it’s not a trivial question: it’s not “Can a computer answer difficult questions correctly?” but rather, “Can a computer behave in ways that are indistinguishable from human behavior?” In other words, getting the “right” answer has nothing to do with the test. In fact, if you were trying to tell whether you were “talking to” a computer or a human, and got only correct answers, you would have every right to be deeply suspicious.

Alan Turing was thinking explicitly of this: in his 1950 paper, he proposes question/answer pairs like this:

Q: Please write me a sonnet on the subject of the Forth Bridge.

A: Count me out on this one. I never could write poetry.

Q: Add 34,957 to 70,764.

A: (Pause about 30 seconds and then give as answer) 105,621.

We’d never think of asking a computer the first question, though I’m sure there are sonnet-writing projects going on somewhere. And the hypothetical answer is equally surprising: it’s neither a sonnet (good or bad), nor a core dump, but a deflection. It’s human behavior, not accurate thought, that Turing is after. This is equally apparent with the second question: while it’s computational, just giving an answer (which even a computer from the early ’50s could do immediately) isn’t the point. It’s the delay that simulates human behavior.

Dave Ferrucci, IBM scientist and Watson project director
Dave Ferrucci, IBM scientist and Watson project director

While Watson presumably doesn’t have delays programmed in, and appears only in a situation where deflecting a question (sorry, it’s Jeopardy, deflecting an answer) isn’t allowed, it’s much closer to this kind of behavior than any serious attempt at AI that I’ve seen. It’s an attempt to compete at a high level in a particular game. The game structures the interaction, eliminating some problems (like deflections) but adding others: “misleading or ambiguous answers are par for the course” (to borrow from NPR’s “What Do You Know”). Watson has to parse ambiguous sentences, decouple multiple clues embedded in one phrase, to come up with a question. Time is a factor — and more than time, confidence that the answer is correct. After all, it would be easy for a computer to buzz first on every question, electronics does timing really well, but buzzing first whether or not you know the answer would be a losing strategy for a computer, as well as for a human. In fact, Watson would handle the first of Turing’s questions perfectly: if it isn’t confident of an answer, it doesn’t buzz, just as a human Jeopardy player.

Equally important, Watson is not always right. While the film clip on IBM’s site shows some spectacular wrong answers (and wrong answers that don’t really duplicate human behavior), it’s an important step forward. As Ferrucci said when I spoke to him, the ability to be wrong is part of the problem. Watson’s goal is to emulate human behavior on a high level, not to be a search engine or some sort of automated answering machine.

Some fascinating statements are at the end of Turing’s paper. He predicts computers with a gigabyte of storage by 2000 (roughly correct, assuming that Turing was talking about what we now call RAM), and thought that we’d be able to achieve thinking machines in that same time frame. We aren’t there yet, but Watson shows that we might not be that far off.

But there’s a more important question than what it means for a machine to think, and that’s whether machines can help us to ask questions about huge amounts of ambiguous data. I was at a talk a couple of weeks ago where Tony Tyson talked about the Large Synoptic Survey Telescope project, which will deliver dozens of terabytes of data per night. He said that in the past, we’d use humans to take a first look at the data and decide what was interesting. Crowdsourcing analysis of astronomical images isn’t new, but the number of images coming from the LSST is even too large for a project like <a href="GalaxyZoo. With this much data, using humans is out of the question. LSST researchers will have to use computational techniques to figure out what’s interesting.

“What is interesting in 30TB?” is an ambiguous, poorly defined question involving large amounts of data — not that different from Watson. What’s an “anomaly”? You really don’t know until you see it. Just as you can’t parse a tricky Jeopardy answer until you see it. And while finding data anomalies is a much different problem from parsing misleading natural language statements, both projects are headed in the same direction: they are asking for human behavior in an ambiguous situation. (Remember, Tyson’s algorithms are replacing humans in a job humans have done well for years). While Watson is a masterpiece of natural language processing, it’s important to remember that it’s just a learning tool that will help us to solve more interesting problems. The LSST and problems of that scale are the real prize, and Watson is the next step.

Photo credit: Courtesy of International Business Machines Corporation. Unauthorized use not permitted.


tags: , , , , , , ,
  • Mark de Visser

    Q: Add 34,957 to 70,764.

    A: (Pause about 30 seconds and then give as answer) 105,621.

    :-) Giving wrong answers certainly adds credibility to the suspicion that it was a human who created this answer and not a computer. Artificial lack-of-intelligence?

    • You’ll have to ask Alan Turing about his addition; this is cut-and-paste from the original paper. Seriously, I think it likely that the error is intentional (which might make it the first Easter Egg in the history of computing). Turing understood that you don’t have a machine that can think like a human if it can’t make mistakes. In terms of his operational criteria, if we are asking questions of something that is always right, we should be suspicious that it is a machine.

      So, yes, I do believe that artificial stupidity is part of artificial intelligence. And I think Turing would have agreed.

  • Alex Tolley

    What’s an “anomaly”? You really don’t know until you see it.

    Is that really true? Isn’t it something sitting outside the expected range of the distribution curve for the measured pattern? What those patterns are is determined by experience and perception, but not in principle difficult to program in to the computer. In other words we can program the expected patterns and use the algorithms to find objects/relationships that do not fit the pattern, like a square shaped galaxy, or a perfect circle of a dozen stars.

    On your larger point, just how helpful is it to have machines that think like humans, unless you want tools that interact with humans better? Machines that complement our thinking are more useful for most activities. Machines who think like us might be a better tool for understanding how humans think, rather than as useful tools to aid our work.

  • Andy

    > recently appeared as a contestant on Jeopardy.

    When did this happen? I’ve seen lots of coverage that it *will*, but not that it *has*. What were the results?

  • bowerbird

    mike said:
    > This is equally apparent with the second question:
    > while it’s computational, just giving an answer
    > (which even a computer from the early ’50s
    > could do immediately) isn’t the point.
    > It’s the delay that simulates human behavior.

    well, the delay is _part_ of it, to be sure…

    but so is the error, and specifically the _type_ of error.

    it’s a failure to carry a 1, which is the kind of mistake
    that a human being would make.

    so i think you whiffed on this one completely, mike…

    whereas i think turing knew exactly what he was doing.

    but, you know, maybe that’s just me…


    alex, great points all around…


    andy, the computer (“watson”) played a mock game of jeopardy.

    > http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?pagewanted=print

    i think they want to give watson a big boost of self-confidence
    before he goes face-to-face against the intimidating alex tribek.
    alex — a ladies man — is said to have a distinct preference for
    humans, and harbors a grudge against what he calls “the robot”.

    plus watson needs some more practice with the signaling device;
    his handlers report that it befuddles him, and “he’s all thumbs”…


  • Brian Ahier

    Have some fun :-)

    You can play NYTs interactive simulation against Watson here:


  • Brad Arnold

    I remember 20 years ago, it was conventional wisdom that a computer would never beat the best man at chess. HA! Now, conventional wisdom is that AI is in a primitive stage. Let me tell you that corporation are rushing to create a computer mind that surpasses human. It would give an incredible competitive advantage to that company. It would surprise most people the strides that have been made in AI in the last couple of decades. I am not sure if an AI has already been created (or let loose) that surpasses human, but the ramifications of a mind existing on Earth that is a step above the human mind are extraordinary (as an example – it would probably usher in the Singularity). On the other hand, the lack of regulation in this area opens up the very real (and tragic) likelihood that the first AI that evolves will not be friendly to human goals.

  • Espen Andersen

    The point of the Turing test is that both the human and the computer, interacting over a teletype, roughly equivalent to Twitter today, should try to convince the observer that they are the human.

    Hence, both the delay nor the computational error are features, not errors.

    Incidentally, for those who haven’t: Read the paper. Amazingly ahead of its time, and very well written.

  • Russell

    What is the point of the Turing test? A clever concept, but a stupid goal?

    Calculator on ->
    65467 * 53432 =
    “Sorry, not in the mood”

    Meet the bin.

  • Esmail

    On the other hand, the lack of regulation in this area opens up the
    > very real (and tragic) likelihood that the first AI that evolves will
    > not be friendly to human goals.

    Perhaps a version of Asimov’s Laws of Robotics might be applicable:

    1. A robot may not injure a human being or, through inaction, allow a
    human being to come to harm.

    2. A robot must obey any orders given to it by human beings, except
    where such orders would conflict with the First Law.

    3. A robot must protect its own existence as long as such protection
    does not conflict with the First or Second Law.

  • Mike

    Great article. I look forward to where this research goes.
    Just wanted to say I found at least 3 typos in 5 minutes of reading turing’s paper from the link…so you may not want to attribute the math error to him. Might be that errors were introduced when the paper was posted online.