Living with Alexa: the problems with “voice” as an interface

Alexa represents a growing trend. The big tech players are investing heavily in voice-control (in order of current competence: Amazon, Google, Microsoft, Apple). They see voice as the next step logical interface for technology. Gone are they days, they argue, when we had to do anything quite so quaint as type on a keyboard. And in many ways, they’re right: being able to say what you want and get an immediate answer can be fantastically useful. But it doesn’t always beat a good-old-fashioned keyboard.

I’ve lived with Alexa (the virtual assistant held within the Amazon Echo) for six months now. The experience has changed how I feel about voice-as-an-interface. Here are a few of my observations:

  • Failure states need a lot of work, and Alexa hasn’t got it right yet. The “I’m sorry, I’m having trouble understanding right now” message gets more annoying than a 404 or 500 error page ever does. Also, defaulting to a web-search is not the best option for most queries. Precisely when you didn’t want to have to mess about, the interaction becomes longer and more complicated.
  • Text input is often more precise (and certainly more concise). Natural Language Processing has its limits, and sometimes you don’t want the hassle of converting your abstract thought into prose.
  • You can’t browse as easily – it’s quicker to read than to listen, and easier to bounce-between and assess options. You can’t “skim-listen”.
  • Half the time you feel like an idiot talking aloud to your devices.
  • It’s not discreet: I often want to multi-task, particularly in boring meetings. (This is more of a future-worry, as I currently only have voice-enabled stuff in my house.)
  • Alexa isn’t able to distinguish between individual voices. We want (and expect) her to work in all scenarios: whether we’ve got a cold, or are whispering, or shouting, or in the next room. As a result, natural-language interpretation is very good, but it responds to any voice. I think Alexa needs to know who her “boss” is…
  • Following on from that, security is non-existent. My friends can hijack the party playlist. Adverts can inadvertently purchase things from Amazon (this has not happened to us yet, but Alexa has woken up because of wake-words being spoken on the telly).
  • These is a palpable shortage of “skills” (the Alexa-speak term for what other platforms would call apps), and the ones that are available are either very clunky or too US-centric. A16Z analyst Benedict Evans thinks it’s not surprising that most ‘skills’ go unused. “It’s hard enough to remember what apps you have when you can see them on a screen – having to remember to ask for them by name is the new command line”.
  • Amazon are doing their best to kick-start a “skills” ecosystem. They have an enthusiastic developer-outreach programme with a big marketing budget (based on the targeted adds I can’t seem to escape from…). They’ve also opened up the technology, so developers can run the (Java-based) Alexa code on their own machines.
  • Music has been the killer app for us. Being able to select a song, pause, and skip without having to open any apps or type anything is great. It’s eliminated a friction I didn’t even know was there. Selecting music manually now feels archaic. Porting my (large) music library into Amazon Music was a pain (and I’m still not finished syncing everything). It also cost £25 to get enough storage, but being able to select specific artists and songs was totally worth it. The default music library you get with Prime is fairly comprehensive, but Amazon definitely wants us all to sign-up for their music streaming service (not my bag at all).
  • I was surprised how often I’d simply say “Alexa, play some music”. After a little training period (“Alexa, I don’t like this!”) the resulting playlists were actually pretty good.
  • The “always on” nature of the Echo, while creepy, is one the most game-changing aspects. It’s the factor that tips Alexa from a novelty to a utility. I just speak into the air, and she responds. No boot-up time, no log-in required. She just works.
  • The speech detection is excellent. We have the Echo in the living room (tucked behind the telly) and it can still hear me when I’m in the kitchen, even when there are other people in both rooms. Alexa gets easily confused when multiple people try to talk at once (or talk over each other, which happens a lot when the family are visiting). But I get easily confused when that happens too, so I can’t judge the Echo too harshly on this score.
  • Getting answers to trivia & general-knowledge questions might be the most useful feature, but the music features are still more impressive when showing off to friends.