Buttons were an inspired UI hack, but now we've got better options

If you’ve ever seen a child interact with an iPad, you’ve seen the power of the touch interface in action. Is this a sign of what’s to come — will we be touching and swiping screens rather tapping buttons? I reached out to Josh Clark (@globalmoxie), founder of Global Moxie and author of “Tapworthy,” to get his thoughts on the future of touch and computer interaction, and whether or not buttons face extinction.

Clark says a touch-based UI is more intuitive to the way we think and act in the world. He also says touch is just the beginning — speech, facial expression, and physical gestures are on they way, and we need to start thinking about content in these contexts.

Clark will expand on these ideas at Mini TOC Austin on March 9 in Austin, Texas.

Our interview follows.

Are we close to seeing the end of buttons?

Josh Clark: I frequently say that buttons are a hack, and people sometimes take that the wrong way. I don’t mean it in a particularly negative way. I think buttons are an inspired hack, a workaround that we’ve needed just to get stuff done. That’s true in the real world as well as the virtual: A light switch over here to turn on a light over there isn’t especially intuitive, and it’s something that has to be learned, and re-learned for every room we walk into. That light switch introduces a middle man, a layer of separation between the action and the thing you really want to work on, which is the light. The switch is a hack, but a brilliant one because it’s just not practical to climb up a ladder in a dark room to screw in the light bulb.

Buttons in interfaces are a similar kind of hack — an abstraction we’ve needed to make the desktop interface work for 30 years. The cursor, the mouse, buttons, tabs, menus … these are all prosthetics we’ve been using to wrangle content and information.

With touchscreen interfaces, though, designers can create the illusion of acting on information and content directly, manipulating it like a physical object that you can touch and stretch and drag and nudge. Those interactions tickle our brains in different ways from how traditional interfaces work because we don’t have to process that middle layer of UI conventions. We can just touch the content directly in many cases. It’s a great way to help cut through complexity.

The result is so much more intuitive, so much more natural to the way we think and act in the world. The proof is how quickly people with no computing experience — people like toddlers and seniors — take so quickly to the iPad. They’re actually better with these interfaces than the rest of us because they aren’t poisoned by 30 years of desktop interface conventions. Follow the toddlers; they’re better at it than we are.

So, yes, in some contexts, buttons and other administrative debris of the traditional interface have run their course. But buttons remain useful in some contexts, especially for more abstract tasks that aren’t easily represented physically. The keyboard is a great example, as are actions like “send to Twitter,” which don’t have readily obvious physical components. And just as important, buttons are labeled with clear calls to action. As we turn the corner into popularizing touch interactions, buttons will still have a place.

Mini TOC Austin— Being held March 9, 2012 — right before SXSW — O’Reilly Tools of Change presents Mini TOC Austin, a one-day event focusing on Austin’s thriving publishing, tech, and bookish-arts community.Register to attend Mini TOC Austin

What kinds of issues do touch- and gesture-oriented interfaces present?

Josh Clark: There are issues for both designers and users. In general, if a touchscreen element looks or behaves like a physical object, people will try to interact with it like one. If your interface looks like a book, people will try to turn its pages. For centuries, designers have dressed up their designs to look like physical objects, but that’s always just been eye candy in the past. With touch, users approach those designs very differently; they’re promises about how the interface works. So designers have to be careful to deliver on those promises. Don’t make your interface look like a book, for example, if it really works through desktop-like buttons. (I’m looking at you, Contacts app for iPad.)

So, you can create really intuitive interfaces by making them look or behave like physical objects. That doesn’t mean that everything has to look just like a real-world object. Windows Phone and the forthcoming Windows 8 interface, for example, use a very flat tile-like metaphor. It doesn’t look like a 3-D gadget or artifact, but it does behave with real-world physics. It’s easy to figure out how to slide and work the content on the screen. People figure that stuff out really quickly.

The next hurdle — and the big opportunity for touch interfaces — is moving to more abstract gestures: two- and three-finger swipes, a full-hand pinch, and so on. In those cases, gestures become the keyboard shortcuts of touch and begin to let you create applications that you play more than you use, almost like an instrument. But wait, here I am talking about abstract gestures; didn’t I just say that abstractions — like buttons — are less than ideal? Well, yeah, the trouble is you don’t want to have the overhead of processing an interface, of thinking through how it works. The thing about physical abstractions (like gestures) versus visual abstractions (like buttons) is that physical actions can be absorbed into muscle memory. That kind of subconscious knowledge is actually much faster than visual processing — it’s why touch typists are so much faster than people who visually peck at the keys. So, once you learn and absorb those physical actions — a two-finger swipe always does this or that — then you can actually move really quickly through an interface in the same way a pianist or a typist moves through a keyboard. Intent fluidly translated to action.

But how do you teach that stuff? Swiping a card, pinching a map, or tapping a photo are all based on actions we know from the physical world. But a two-finger swipe has no prior meaning. It’s not something we’ll guess. Gestures are invisible with no labels, so that means they have to be taught.

Screenshot from Apple's built-in trackpad tutorial
Screenshot from Apple’s trackpad tutorial.

In what ways can UI design alleviate these learning issues?

Josh Clark: Designers should approach this by thinking through how we learn any physical action in the real world: observation of visual cues, demonstration, and practice. Too often, designers fall back on instruction manuals (iPad apps that open with a big screen of gesture diagrams) or screencasts. Neither are very effective.

Instead, designers have to do a better job of coaching people in context, showing our audiences how and when to use a gesture in the moment. More of us need to study video game design because games are great at this. In so many video games, you’re dropped into a world where you don’t even know what your goal is, let alone what you’re capable of or what obstacles you might encounter. The game rides along with you, tracking your progress, taking note of what you’ve encountered and what you haven’t, and giving in-context instruction, tips, and demonstrations as you go. That’s what more apps and websites should do. Don’t wait for people to somehow find a hidden gesture shortcut; tell people about it when they need it. Show an animation of the gesture and wait for them to copy it. Demonstration and practice — that’s how we learn all physical actions, from playing an instrument to learning a tennis serve.

How do you see computer interaction evolving?

Josh Clark: It’s a really exciting time for interaction design because so many new technologies are becoming mature and affordable. Touch got there a few years ago. Speech is just now arriving. Computer vision with face recognition and gesture recognition like Kinect are coming along. So, we have all these areas where computers are learning to understand our particularly human forms of communication.

In the past, we had to learn to act and think like the machine. At the command line, we had to write in the computer’s language, not our own. The desktop graphical user interface was a big step forward in making things more humane through visuals, but it was still oriented around how computers saw the world, not humans. When you consider the additions of touch, speech, facial expression, and physical gesture, you have nearly the whole range of human (and humane) communication tools. As computers learn the subtleties of those expressions, our interfaces can become more human and more intuitive, too.

Touchscreens are leading this charge for now, but touch isn’t appropriate in every context. Speech is obviously great for the car, for walking, for any context where you need your eyes elsewhere. We’re going to see interfaces that use these different modes of communication in context-appropriate combinations. But that means we have to start thinking hard about how our content works in all these different contexts. So many are struggling just to figure out how to make the content adapt to a smaller screen. How about how your content sounds when spoken? How about when it can be touched, or how it should respond to physical gestures or facial expressions? There’s lots of work ahead.

Are Google’s rumored heads-up-display glasses a sign of things to come?

Josh Clark: I’m sure that all kinds of new displays will have a role in the digital future. I’m not especially clever about figuring out which technology will be a huge hit. If someone had told me five years ago that the immediate future would be all about a glass phone with no buttons, I’d have said they were nuts. I think both software and context and, above all, human empathy make the difference in how and when a hardware technology becomes truly useful. The stuff I’ve seen of the heads-up-display glasses seems a bit awkward and unnatural. The twitchy way you have to move your head to navigate the screen seems to ask you to behave a little robot-like. I think trends and expectations are moving in an opposite direction — technology that adapts to human means of expression, not humans adapting to technology.

This interview was edited and condensed.

Related: