Getting ready for the voice-first era is currently a top priority for leading companies around the globe. From Amazon Alexa’s $100 million Alexa Fund, to The Slack Fund, an $80 million investment fund, there’s a lot of support for startups exploring voice technology.
Whether you believe the Forbes article that claims 2017 will be the year of voice, you should at least be paying attention.
Already, more than half of US teens and 41% of US adults use voice search on a daily basis, with its use continuing to grow every day according study conducted by Northstar Research.
What’s more, by 2020, 50% of all searches will be voice searches according to comScore. Voice will be the next big frontier in technology—bigger than the mobile app economy ever was.
But how many businesses are ready for this shift? And who will get them there?
You’ve probably guessed it. It will be you—the designer!
“More than half of US teens use voice search on a daily basis.”
In order to prepare designers for this new paradigm, CareerFoundry has launched a comprehensive Voice User Interface Design course in collaboration with Amazon Alexa.
Here are 5 actionable tips for your first voice design project:
Create a personality, not just a persona
This is a new secret weapon for your brand.
Voice interface is a whole new avenue to take advantage of—you can literally give your brand a voice.
When designing the experience, you should consider that the user is conversing with a unique character. This will help you think of creative ways to delight your users.
You can expect drastically different reactions from users depending on whether you use a male or female voice, old or young, which dialect you use, the speed at which they talk, and finally of course, what they actually say.
The system persona (or personality if you will) is one of the most creative, fun, and challenging parts that designers will be tasked with getting right!
Remember that we don’t talk the way we type
Make sure your system understands natural speech (i.e. full sentences or questions, rather than a series of keywords).
Whereas we might type, “best pizza Chicago.”
We’re more likely to ask our voice service something like, “Alexa, can you find me a good pizza place in Chicago?”
Machines’ ability to decipher human speech has been one of the biggest challenges in creating seamless voice interactions. However, now that this is happening, designers have the exciting task of ensuring that machines understand and can respond to thousands of different commands.
On the flipside, we should remember that some things just aren’t compatible with speech. For example, imagine if you had to use your voice to enter your password. It’s likely to be a strange combination of words, letters, and numbers—easy to type but hard to say.
Also, letters like “P” and “B” in English are frequently confused for each other. Speech recognition systems can have trouble getting them right when they’re completely void of context, as in a password.
Adapt your user flows logically
You are most likely aware of the importance of user flows when building out a visual design experience.
These user flows show the fundamental logic of a system; including the different key goals that a user might want to achieve (and that the business wants them to achieve).
So, for a music streaming website, the typical user flows might be signing up, or logging in and playing a specific playlist, or browsing recommended songs and creating a playlist.
Loosely, we can call this the “what.” The how comes later. The how in voice will be the utterances and the responses, rather than top navs, search bars, and dropdowns.
But before we get to this, we want to check that the “what” still applies. For example, in our music streaming example, we might realize that we want to reduce the amount of user flows on our voice device. So making a playlist via voice might be tedious when compared to doing it on a graphical interface.
This mirrors the tendency for designers to recommend a reduced functionality on apps versus desktop, based on demand and also practicality.
In summary, it’s about determining which functionality makes sense and which doesn’t before you move onto the details.
Context is everything
Context can really make or break a voice interaction.
The chart below shows the different ways people are using voice right now, and you can clearly see that some settings are better for voice interactions than others.
Your voice assistant will have a hard time if the environment is loud or if there’s a lot of other noise present. Unlike the human ear, speech recognition has a hard time differentiating speech from all the other environmental sounds.
“With voice design, context is everything.”
You can use certain technologies to help, like wind noise modeling for car systems and cancellation of the audio the system itself is playing. This is why voice assistants can usually recognize speech commands when music is playing—it cancels out the music it’s aware of. However, if the noise comes from a TV or some unrecognizable source, it gets hard.
Always consider the user’s context and ensure the user’s physical environment is conducive to voice.
One of the easiest ways to mess up voice design is to design for the wrong situations.
Consider a voice-activated elevator like the one in the video below. In actual fact, voice-activated elevators do not really improve user experience, when simply pressing a single button is easy enough!
Avoid cognitive overload
When we’re accessing information on a screen, we can peruse the information in a leisurely fashion. Our attention span is generally at a fairly consistent level.
When we interact with a voice service, we need to pay full attention right at the moment the information is delivered—or else we miss it. Afterwards, we can go back to not really paying much attention to it at all.
The role of the designer will be to figure out how much information is appropriate for a voice service to deliver at once, to avoid exceeding the cognitive load that a person can comfortably process.
User experience and voice designer Daniel Westerlund recommends using Grice’s Maxim of Quantity to strike a perfect balance. The most important takeaway from his article is to provide just enough information via voice, but absolutely nothing superfluous.
As with any big disruption in tech, companies will want to jump on the bandwagon. But, it’s important to take a step back and think about the user first and foremost. Voice will change almost all interactions we have with machines. Almost.
Be smart about where to use voice, and design it in a way that everybody can use it—no matter the accent, language, or tone of voice.
Voice will impact every industry, every business, and every product within the next decade. Because of this, there is a huge need for specialized talent who can successfully lead this shift in human-computer interaction.
If you want to become a go-to specialist for VUI, check out CareerFoundry’s brand-new online course for Voice User Interface Design, built in collaboration with Amazon Alexa.
You’ll love these posts, too
Raffaela Rein is the CEO and co-founder of <a href="https://careerfoundry.com/en/courses/become-a-ux-designer">CareerFoundry</a>, one of the leading online schools for UX training. She is dedicated to educating the next generation of digital talent; helping people build careers they love. She is passionate about the topic UX design, in particular why UX-led companies build the most successful products. Prior to CareerFoundry, Raffaela built companies for Rocket Internet and Axel Springer and worked as an investment strategist for BlackRock.