Watson and the future of machine learning

Watson opens the door to conversations, not just answers.

WatsonLast Tuesday, I attended a symposium at IBM Research about Watson, the computer that just defeated two of the strongest Jeopardy champions in history. Winning at Jeopardy is an extremely difficult problem for a computer: natural language processing (NLP) is difficult enough by itself, but in Jeopardy, you have to deal with language that is intentionally ambiguous. Puns, misleading statements, and irrelevant detail are the norm; topic categories usually have little to do with the actual questions.

But I was less interested in the NLP than in some of the other “thought processes” that go into Watson. Dr. David Ferrucci, Watson’s inventor, talked about two elements that are just under the surface. In Jeopardy, you see only questions and answers. But Watson (like many machine learning systems) generates an answer* by generating a large number of possible answers, each attached to a score, or a confidence level. The highest confidence level wins. Some strategy ties the confidence level into the whether (and when) to buzz in, but that’s less interesting to me. What’s more important, as we think about real-world applications for Watson’s DeepQA technology, is the list of possible answers. Watson proved that it can make mistakes (and elsewhere, I’ve argued that “artificial stupidity” is an essential component of artificial intelligence). Once we realize that we’re not expecting a computer to dispense correctness, but to help us solve a real-world problem, the list of possibilities becomes relevant.

I recently read an article in the New York Times about a woman who had a poison-ivy-like rash — but it wasn’t poison ivy, but an uncommon reaction to undercooked Shiitake mushrooms. How would we want a medical diagnostics application to behave in a puzzling case like this? Presented with a set of symptoms and a database of medical knowledge, its first suggestion would probably be “poison ivy,” along with the human MDs. But unlike humans, computers aren’t blinded by the obvious. Once poison ivy is rejected, the next step is to look at the answers with lower confidence ratings. That’s where you would expect to see alternative explanations. How far do you have to go before you get to the correct diagnosis? I don’t know, but we haven’t yet finished.

The next level down in Watson’s analysis is even more interesting. The confidence level assigned to each answer comes from how well the answer matched various sources of information. Possible answers are scored against a number of data sources; these scores are weighted and combined to form the final confidence rating. If exposed to the human users, the scoring process completely changes the kind of relationship we can have with machines. An answer is one thing; a series of alternate answers is something more; but when you’re looking at the reasons behind the answers, you’re finally getting at the heart of intelligence. I’m not going to talk about the Turing Test. But I am suggesting that, when you have the reasons for the alternative answers in hand, you’re suddenly looking at the possibility of a meaningful conversation between human and machine.

Let’s think about the mushrooms again, and imagine the conversation that might take place. Answer #1, poison ivy, has been ruled out. So the humans can ask Watson “what are the other alternatives?” Perhaps answer #4 is Shiitake mushrooms. Now, the humans can ask “What makes you think it’s mushrooms?” and see the how the answer’s score was generated, and how it compares to other highly ranked answers. Once you have the criteria on which the confidence level is based, you can can ask followup questions like “Would you change your mind if you weighted the data sources differently?” “How would your answer change if we told you the rash wasn’t itchy?” “What other tests should we do to confirm or reject the hypothesis?”

You can see where this is leading. The top-rated answer is less important than the process that leads up to the answer. And as we go beyond strict question-and-answer situations, we find applications that are all about the process. Take LinkedIn’s scarily accurate recommendations for connections. They use machine learning to figure out who, in their database, you’re likely to know. But when they recommend a connection who you don’t already know, you might want to ask “why did you recommend him?” And the answer could be very interesting: shared interests, mutual friends. All of a sudden, “People you may know” becomes “people you might like to know,” “people you might find interesting,” “people you should ask if you have a question about X.” That’s a different and more intriguing kind of service. And it’s potentially conversational in a way that current human-computer interaction isn’t. Instead of sites like Stack Overflow, useful as it is, we could build sites that supported a conversation about problem solving: asking and refining questions, searching across many different sources of information, and evaluating different answers. Again, the point isn’t the answer, but the conversation enabled by looking “under the covers” at how Watson works.

Now that we can build machines that can answer tough and ambiguous questions, the next step is to realize that the answer to a question isn’t the end of the process. What characterizes human thought isn’t the ability to produce answers, but the process that leads up to the answer: assembling and evaluating alternatives, both internally and in conversation with others. So when we’re thinking about Watson, we need to ask what we can do with the information that led up to answer. What kinds of user interfaces will we need to interact with the layers of knowledge underneath the answer? How do you craft a user experience for conversing with a machine? This is the future of machine learning — not just answers, but taking the first steps towards a conversation. That’s where Watson is leading us.

* I must revert to normal terminology. In Jeopardy, of course, the contestants are given answers and have to come up with questions.

Photo credit: Courtesy of International Business Machines Corporation. Unauthorized use not permitted.


tags: , , ,
  • Alex Tolley

    Good perspective, and a welcome diversion from the articles about how Watson won unfairly, yada, yada.

    The obvious fictional parallel to your thoughts is the failure of Bowman and Poole to question HAL about why he thought the AE35 unit was about to fail. Similarly, while Asimov’s Liar! had Herbie mind reading, how would we deal with computers that learned to take into account human feelings and responses?

    IBM wants Watson to become a medical assistant, much like your diagnosis example. The assistant plays Spock to the physician’s role as Kirk. But at some point, it’s role will become the physician’s. Diagnoses will inevitably lead to suggesting treatments, and then the machine will need to take into account and weigh other factors.

    Watson-like assistants could find a host of roles to play in the future, opening up all sorts of possibilities for us to learn about the world and of course fall into new traps.

  • There is the small issue that Jeopardy! has a format that suits the approach that Watson takes, or rather suits statistical AI. Being provided with the answer and having to find the question isn’t the same as being asked a question and finding the answer. In Jeopardy! the clue is long and has lots of information that allows the algorithms to pin down an entry in a knowledge base corresponding to candidate items that can then be used to form a simple question with few words – like “What is Toronto?”

    It often doesn’t even matter that the form of the question isn’t quite right for the answer given as the clue. Watson isn’t a question answering machine – it is an answer questioning machine and it could well be that this is the easier of the two directions. In short, unless IBM can find applications that mimic Jeopardy!, Watson might well be a very expensive dead end.

    I discuss this more in the post:

  • Edward

    Nice post

    Similarly it would be useful for Google to tell me, “I had four different ways of understood your search and understood it as follows,” then I could select a different analysis of my query to get accurate results.

    In other words, search engines don’t need to just churn out a result, which is very difficult becuase then the search engine has to understand what you are really looking for (e.g. search for Beethoven, do I mean the composer or the computer language). They can be interactive and ask you what you mean

  • I think that there is another level of artificial intelligence that could be deduced from this. Given that Watson has a ‘complete’ set of answers and we can assess the statistical difference between them wouldn’t it make sense to as Watson what questions you could ask a patient to increase the statistical difference and therefore generate a more confident conclusion. Using the above example, after poison ivy had been excluded you might ask Watson what question would you ask to narrow down the cause. Watson might suggest you ask whether the patient has eaten any shitake mushrooms recently, knowing that an affirmative answer might increase the probability of the correct diagnosis significantly.

  • A thoughtful and provocative analysis!

    I like the idea of artificial stupidity (Kathryn Schultz’ recent book on “Being Wrong” comes to mind).

    I also really like your observation about computers not being blinded by the obvious. I suspect that many human experts develop bigger blind spots as they become more expert.

    I haven’t followed the recommender system community closely lately, but I imagine insights from a seminal paper by Jon Herlocker, Joe Konstan and John Riedl from CSCW 2000 on Explaining Collaborative Filtering Recommendations may be helpful in this quest for greater transparency in human-machine problem solving:


  • I think in a few years after rolling out beta testing Watson will be the norm with assisting with doctors. Low cost higher degree of knowledge. Makes sense. Computers taking peoples jobs. It’s already hard enough these days to get a job.

  • Makes me think that a program like this could be a great teaching tool in the spirit of Socrates question and answer method.

  • Randy

    Given the fact that a general internal medicine doctor earns from $170K to $220K, after insurance, w/o even being a specialist, tells me that Watson will implode the MD salaries for the general practitioners.

    Insurances will not pay extra, for an MD, playing a souped up Nurse/PA, whose only function is to recommend a specialist, like a gastroenterologist or cardiologist for a special ailment.

    All and all, PAs/nurses will do the stitches, administer Tylenol w/ codeine, while the doctors will need to specialize to keep their salaries.