Last Tuesday, I attended a symposium at IBM Research about Watson, the computer that just defeated two of the strongest Jeopardy champions in history. Winning at Jeopardy is an extremely difficult problem for a computer: natural language processing (NLP) is difficult enough by itself, but in Jeopardy, you have to deal with language that is intentionally ambiguous. Puns, misleading statements, and irrelevant detail are the norm; topic categories usually have little to do with the actual questions.
But I was less interested in the NLP than in some of the other “thought processes” that go into Watson. Dr. David Ferrucci, Watson’s inventor, talked about two elements that are just under the surface. In Jeopardy, you see only questions and answers. But Watson (like many machine learning systems) generates an answer* by generating a large number of possible answers, each attached to a score, or a confidence level. The highest confidence level wins. Some strategy ties the confidence level into the whether (and when) to buzz in, but that’s less interesting to me. What’s more important, as we think about real-world applications for Watson’s DeepQA technology, is the list of possible answers. Watson proved that it can make mistakes (and elsewhere, I’ve argued that “artificial stupidity” is an essential component of artificial intelligence). Once we realize that we’re not expecting a computer to dispense correctness, but to help us solve a real-world problem, the list of possibilities becomes relevant.
I recently read an article in the New York Times about a woman who had a poison-ivy-like rash — but it wasn’t poison ivy, but an uncommon reaction to undercooked Shiitake mushrooms. How would we want a medical diagnostics application to behave in a puzzling case like this? Presented with a set of symptoms and a database of medical knowledge, its first suggestion would probably be “poison ivy,” along with the human MDs. But unlike humans, computers aren’t blinded by the obvious. Once poison ivy is rejected, the next step is to look at the answers with lower confidence ratings. That’s where you would expect to see alternative explanations. How far do you have to go before you get to the correct diagnosis? I don’t know, but we haven’t yet finished.
The next level down in Watson’s analysis is even more interesting. The confidence level assigned to each answer comes from how well the answer matched various sources of information. Possible answers are scored against a number of data sources; these scores are weighted and combined to form the final confidence rating. If exposed to the human users, the scoring process completely changes the kind of relationship we can have with machines. An answer is one thing; a series of alternate answers is something more; but when you’re looking at the reasons behind the answers, you’re finally getting at the heart of intelligence. I’m not going to talk about the Turing Test. But I am suggesting that, when you have the reasons for the alternative answers in hand, you’re suddenly looking at the possibility of a meaningful conversation between human and machine.
Let’s think about the mushrooms again, and imagine the conversation that might take place. Answer #1, poison ivy, has been ruled out. So the humans can ask Watson “what are the other alternatives?” Perhaps answer #4 is Shiitake mushrooms. Now, the humans can ask “What makes you think it’s mushrooms?” and see the how the answer’s score was generated, and how it compares to other highly ranked answers. Once you have the criteria on which the confidence level is based, you can can ask followup questions like “Would you change your mind if you weighted the data sources differently?” “How would your answer change if we told you the rash wasn’t itchy?” “What other tests should we do to confirm or reject the hypothesis?”
You can see where this is leading. The top-rated answer is less important than the process that leads up to the answer. And as we go beyond strict question-and-answer situations, we find applications that are all about the process. Take LinkedIn’s scarily accurate recommendations for connections. They use machine learning to figure out who, in their database, you’re likely to know. But when they recommend a connection who you don’t already know, you might want to ask “why did you recommend him?” And the answer could be very interesting: shared interests, mutual friends. All of a sudden, “People you may know” becomes “people you might like to know,” “people you might find interesting,” “people you should ask if you have a question about X.” That’s a different and more intriguing kind of service. And it’s potentially conversational in a way that current human-computer interaction isn’t. Instead of sites like Stack Overflow, useful as it is, we could build sites that supported a conversation about problem solving: asking and refining questions, searching across many different sources of information, and evaluating different answers. Again, the point isn’t the answer, but the conversation enabled by looking “under the covers” at how Watson works.
Now that we can build machines that can answer tough and ambiguous questions, the next step is to realize that the answer to a question isn’t the end of the process. What characterizes human thought isn’t the ability to produce answers, but the process that leads up to the answer: assembling and evaluating alternatives, both internally and in conversation with others. So when we’re thinking about Watson, we need to ask what we can do with the information that led up to answer. What kinds of user interfaces will we need to interact with the layers of knowledge underneath the answer? How do you craft a user experience for conversing with a machine? This is the future of machine learning — not just answers, but taking the first steps towards a conversation. That’s where Watson is leading us.
* I must revert to normal terminology. In Jeopardy, of course, the contestants are given answers and have to come up with questions.
Photo credit: Courtesy of International Business Machines Corporation. Unauthorized use not permitted.