Kyle Dent

Kyle manages the Conversational Human-Agent Technology (CHAT) research area at the Palo Alto Research Center (formerly Xerox PARC). CHAT develops techniques to understand conversational interaction drawing on many disciplines, including several aspects of computer science, linguistics, and sociology. Their work focuses on improving human-to-machine interfaces as well as analyzing many types of human language interactions from online social media to audio recordings and their transcriptions.

Dialects of the IoT

How intimately we talk to our stuff depends on what it’s done for us lately.

350px-Cleve-van_construction-tower-babel

In the first post in this series, I mentioned that we’re getting used to talking to technology. We talk to our cell phones, our cars; some of us talk to our TVs, and a lot of us talk to customer support systems. The field has yet to settle into a state of equilibrium, but I thought I would take a stab at defining some categories of conversational interfaces.

There is, of course, quite a range of intelligent assistants, but I want to consider specifically different types of conversational interactions with technology. You might have an intelligent agent that can arrange meetings, for example, figuring out attendees’ availability, and even sending meeting requests. Certainly, that’s a useful and intelligent agent, but working with it doesn’t necessarily require any conversational interaction.

Classifying conversational interfaces

As usual with these kinds of things, the boundaries can be fuzzy. So, a particular piece of technology can have aspects of multiple categories, but here’s what I propose.

Voice interfaces: Understand a few set phrases

The most basic level of speech interactions are simple voice interfaces that let you control devices or software by speaking commands. Generally, these systems have a fixed set of actions. Saying a word or phrase is akin to using a menu system, but instead of clicking on menu items, you can speak them. You find these in cars with voice commands and Bluetooth interfaces to make phone calls or play music. It’s the same kind of system when you call into a phone tree that routes you to a particular department or person. Some of these systems allow for variations in how you say something, but for the most part, they will only understand words or phrases from a predefined list.

Read more…

Talking to the IoT

When our stuff speaks to us, we exchange more than ideas.

PSM_V79_D354_An_early_voice_recorderPeople are really good at talking to each other. That shouldn’t be too surprising. Conversation among human beings has evolved over a very long period of time — and now we’re starting to talk to our stuff, and in some cases, it’s talking back.

Asking Siri (or Cortana or Google Now) some simple questions is just the beginning of what’s coming. In fact, we’re in the midst of a significant shift in voice and conversation technology. Companies like Amazon, Facebook, and Google are falling over each other to hire researchers and acquire related companies, and they are starting to use this talent in new and interesting ways.

This is the first post in a series of articles I’ll use to explore speech and conversational interfaces. The subject will be dialog systems in general, with a focus on the intelligent interfaces we can expect to see more of in the future. Other topics could include:

  • Design considerations for spoken language systems
  • Emerging research in the area
  • Changes to how we interact with technology plus the social impact they might have

If you’re someone with a finely tuned hype radar, some skepticism about just how good these technologies might be is understandable. Most of the speech-to-text and automated telephone interactions available up to this point have been frustrating to use. People regularly share tips for short-circuiting interactive voice response (IVR) trees (I hear swearing helps!). And even Siri can seem clueless a lot of the time. Read more…