# Watson and the future of machine learning

## Watson opens the door to conversations, not just answers.

Last Tuesday, I attended a symposium at IBM Research about Watson, the computer that just defeated two of the strongest Jeopardy champions in history. Winning at Jeopardy is an extremely difficult problem for a computer: natural language processing (NLP) is difficult enough by itself, but in Jeopardy, you have to deal with language that is intentionally ambiguous. Puns, misleading statements, and irrelevant detail are the norm; topic categories usually have little to do with the actual questions.

But I was less interested in the NLP than in some of the other “thought processes” that go into Watson. Dr. David Ferrucci, Watson’s inventor, talked about two elements that are just under the surface. In Jeopardy, you see only questions and answers. But Watson (like many machine learning systems) generates an answer* by generating a large number of possible answers, each attached to a score, or a confidence level. The highest confidence level wins. Some strategy ties the confidence level into the whether (and when) to buzz in, but that’s less interesting to me. What’s more important, as we think about real-world applications for Watson’s DeepQA technology, is the list of possible answers. Watson proved that it can make mistakes (and elsewhere, I’ve argued that “artificial stupidity” is an essential component of artificial intelligence). Once we realize that we’re not expecting a computer to dispense correctness, but to help us solve a real-world problem, the list of possibilities becomes relevant.

I recently read an article in the New York Times about a woman who had a poison-ivy-like rash — but it wasn’t poison ivy, but an uncommon reaction to undercooked Shiitake mushrooms. How would we want a medical diagnostics application to behave in a puzzling case like this? Presented with a set of symptoms and a database of medical knowledge, its first suggestion would probably be “poison ivy,” along with the human MDs. But unlike humans, computers aren’t blinded by the obvious. Once poison ivy is rejected, the next step is to look at the answers with lower confidence ratings. That’s where you would expect to see alternative explanations. How far do you have to go before you get to the correct diagnosis? I don’t know, but we haven’t yet finished.

The next level down in Watson’s analysis is even more interesting. The confidence level assigned to each answer comes from how well the answer matched various sources of information. Possible answers are scored against a number of data sources; these scores are weighted and combined to form the final confidence rating. If exposed to the human users, the scoring process completely changes the kind of relationship we can have with machines. An answer is one thing; a series of alternate answers is something more; but when you’re looking at the reasons behind the answers, you’re finally getting at the heart of intelligence. I’m not going to talk about the Turing Test. But I am suggesting that, when you have the reasons for the alternative answers in hand, you’re suddenly looking at the possibility of a meaningful conversation between human and machine.

* I must revert to normal terminology. In Jeopardy, of course, the contestants are given answers and have to come up with questions.

Photo credit: Courtesy of International Business Machines Corporation. Unauthorized use not permitted.

Related:

• Alex Tolley

Good perspective, and a welcome diversion from the articles about how Watson won unfairly, yada, yada.

The obvious fictional parallel to your thoughts is the failure of Bowman and Poole to question HAL about why he thought the AE35 unit was about to fail. Similarly, while Asimov’s Liar! had Herbie mind reading, how would we deal with computers that learned to take into account human feelings and responses?

IBM wants Watson to become a medical assistant, much like your diagnosis example. The assistant plays Spock to the physician’s role as Kirk. But at some point, it’s role will become the physician’s. Diagnoses will inevitably lead to suggesting treatments, and then the machine will need to take into account and weigh other factors.

Watson-like assistants could find a host of roles to play in the future, opening up all sorts of possibilities for us to learn about the world and of course fall into new traps.

• There is the small issue that Jeopardy! has a format that suits the approach that Watson takes, or rather suits statistical AI. Being provided with the answer and having to find the question isn’t the same as being asked a question and finding the answer. In Jeopardy! the clue is long and has lots of information that allows the algorithms to pin down an entry in a knowledge base corresponding to candidate items that can then be used to form a simple question with few words – like “What is Toronto?”

It often doesn’t even matter that the form of the question isn’t quite right for the answer given as the clue. Watson isn’t a question answering machine – it is an answer questioning machine and it could well be that this is the easier of the two directions. In short, unless IBM can find applications that mimic Jeopardy!, Watson might well be a very expensive dead end.

I discuss this more in the post:
http://www.i-programmer.info/programming/artificial-intelligence/2012-watson-wins-jeopardy-trick-or-triumph-.html

• Edward

Nice post

Similarly it would be useful for Google to tell me, “I had four different ways of understood your search and understood it as follows,” then I could select a different analysis of my query to get accurate results.

In other words, search engines don’t need to just churn out a result, which is very difficult becuase then the search engine has to understand what you are really looking for (e.g. search for Beethoven, do I mean the composer or the computer language). They can be interactive and ask you what you mean

• I think that there is another level of artificial intelligence that could be deduced from this. Given that Watson has a ‘complete’ set of answers and we can assess the statistical difference between them wouldn’t it make sense to as Watson what questions you could ask a patient to increase the statistical difference and therefore generate a more confident conclusion. Using the above example, after poison ivy had been excluded you might ask Watson what question would you ask to narrow down the cause. Watson might suggest you ask whether the patient has eaten any shitake mushrooms recently, knowing that an affirmative answer might increase the probability of the correct diagnosis significantly.

• A thoughtful and provocative analysis!

I like the idea of artificial stupidity (Kathryn Schultz’ recent book on “Being Wrong” comes to mind).

I also really like your observation about computers not being blinded by the obvious. I suspect that many human experts develop bigger blind spots as they become more expert.

I haven’t followed the recommender system community closely lately, but I imagine insights from a seminal paper by Jon Herlocker, Joe Konstan and John Riedl from CSCW 2000 on Explaining Collaborative Filtering Recommendations may be helpful in this quest for greater transparency in human-machine problem solving:

http://www.grouplens.org/node/220

• I think in a few years after rolling out beta testing Watson will be the norm with assisting with doctors. Low cost higher degree of knowledge. Makes sense. Computers taking peoples jobs. It’s already hard enough these days to get a job.

• Makes me think that a program like this could be a great teaching tool in the spirit of Socrates question and answer method.

• Randy

Given the fact that a general internal medicine doctor earns from $170K to$220K, after insurance, w/o even being a specialist, tells me that Watson will implode the MD salaries for the general practitioners.

Insurances will not pay extra, for an MD, playing a souped up Nurse/PA, whose only function is to recommend a specialist, like a gastroenterologist or cardiologist for a special ailment.

All and all, PAs/nurses will do the stitches, administer Tylenol w/ codeine, while the doctors will need to specialize to keep their salaries.