Voice UX 101: Certain Cases, Patterns, and Tools

By 2019-09-22News

Voice UX 101: Certain Cases, Patterns, and Tools originally published on Muzli – Design Inspiration – Medium

Some UX/UI designers still remain skeptical about voice interfaces. Someone thinks it’s just a marketing hype that will pass soon. Others believe voice assistants are awkward and unnatural (although, they’ve never even tried one). And while they were hesitating, a professional community has shaped up — with their own secrets, patterns, mechanics and the job market too.

I grabbed out one clever cookie out of this fellowship to puzzle out Voice Tech. Kate Yulina, UX architect at Just AI, shared her thoughts on what is needed to be changed in a UX designer’s way of thinking when they decide to dive into Voice UX.

Voice is convenient. It should be

User Experience cannot be designed in a vacuum. Voice skill is created in a specific case — where and when it’s useful. You can’t think of the skill first and of its appliance second. What’s really important here is the situation itself. Voice is more convenient than web or mobile apps in a situation where we need a concrete function to solve a concrete problem. Why? Because we don’t have to wait till the website is fully loaded, or to scroll the pages, or to push any buttons. Sites and apps are cross-functional. Voice skill should be geared to the concrete situation, right here and right now.

A great example here is the Nike campaign that took place this February at the Lakers-Celtics NBA rivalry. During the halftime, they dropped a message saying limited Nike Adapt BBs are now on sale and anyone can get a pair for $350 — they just had to ask Google Assistant for it:

The shoe was sold out in 6 minutes! Over 15000 people did their orders using voice only. This goes to prove that a perfect situation here — is a real game where real athletes are wearing product in the moment.

Voice is good not only for brands, but it also suits enterprise needs. Alan AI had a nice use case here: technical staff that handles elevators in the US spent a lot of time on documentation — they had to fill up forms, maintenance data, had to report on task performance, etc. And while they did it, they didn’t actually work. Alan AI employed voice AI technologies to address this problem: now tech staff can fill in the blanks by voice during their main job or on their way to work. No time loss, no stress.

? Voice UX is not about the piccies, it’s about the context of the situation. The designer’s role here is to examine this context in detail in order to understand what a user would want to do in a situation like that

Voice UX-patterns

One function

Once and for all: one skill — one function. If a person wants to order a cup of coffee — they use a coffee skill. Next thing they want to know a distance to the moon? That’s another voice skill. The all-in-one approach doesn’t work here.


UX/UI designers and marketers argue all the time about what volume of information is enough for the first screen. There’s nothing worse than endless scrolling while waiting for the content to download. Dialog interfaces got the first screen too, but unlike the web, there’s no need to scroll, because voice assistant is a selection of skills. A user simply says the invocation phrase that activates a single function.

Modal windows and buttons

Another name for modal windows is a dialog. What are the modal windows used for? To confirm or deny a user’s intention to do something. In actual life, people express their intentions with yes or no form and nobody wants the buttons, trust me.


Smart speakers, smart displays, smart homes, smartphones, talking toys, and car’s dashboard — these devices define the context for the use. Some things are perfect for smart speaker usage at home, and it totally differs from in-car usage. One and the same assistant should have a different skill set across different platforms. You got to define the context for the use and the device itself. Check, whether voice may be combined with any other mode of interaction

? You can’t invoke custom skills if you don’t know the invocation phrase. Besides, invocation commands vary depending on the ecosystem. In terms of UX, this is the biggest vulnerability. That’s why major companies are working strenuously on skill discovery — in course of time finding new and trending skills would be much easier

BTW, VUI-designers insist that skills with the same mechanics built for different assistants should be specified as separate projects.

User interface design in 7 stages

I’ve been asking VUI designers and users of our chatbot builder where do they begin their work on the script. Everyone was saying it starts with an idea. And it can be described in any format in any text editor.

The fastest and easiest way to understand how the dialog between a user and your app would be structured — is to write down an example of it. It’s a text file describing flow performance. These dialog examples would remind you of a film script, where all the remarks are applied to the roles.

Pavel Gvay, tortu.io co-founder and CEO

Notion dialog example

1. Greet your user

Tell them what your bot can do. Use short-length phrases. Finish phrases with a yes-no question — that way users would understand what is expected of them.

Bad “Hey! I’m Activity bot and I can’t live a day without sport. Sport is my passion, my inspiration! Also, I’ve got a great experience as a coach and I have a hundred exercises stored in my base. I’d love to share my favorite ones with you!”

Good “Hello! I am Activity, athlete bot. I’d be glad to recommend you a set of exercises. Do you want me to tell you about yoga?”

2. Think through the user flow

Designers apply flowcharts to work with the user flows. They define the app’s logic: a flowchart consists of the dialog steps, and sometimes logical elements are added — API requests, work with the context, etc.

Miro flowchart

Quite often flowchart shows the main forks in a voice skill. Some designers use flowchart to cover up every detail of the skill. But we wouldn’t really recommend doing that, because pretty soon it would become non-legible and any correction would take oodles of time.

You better start with a success path — a simple and easy-to-understand exercise to help with user flows. Make it through till the end and you will see where new conditions and departures from the main script appear.

3. Imbed navigation

Wherever your user is, it should always be possible for them to begin again, to turn back, to make headway, and to give variable replies. Don’t ever make them keep in mind all the commands.

4. Write out dialog examples and think of the more varied answers

My colleagues keep dialog examples in Google sheets. They find it handy, but it is not the best option, actually, because you have to describe logic and possible transitions too. Some people script using Word. There’ s no common format, no rules, no regulation. Just do what you feel comfortable doing.

Users get angry when the assistant repeats oneself, sounding like a broken record. Nelly Kamaeva, VUX designer, confirms this hypothesis. During the skill test, she saw how quickly kids have lost their interest when hearing the same answer repeated again.

? Think of the few synonymous remarks, a user would hear when they get on the same step of a script. VUI designers recommend using 3 to 10 alternative versions of the same phrase

5. Explore the Catch-All for unidentified intents

Catch-All is a place where all unidentified intents fall.


Who am I talking to? Are you a robot?

Well, you got me. Do you still want to continue our conversation?

The phrase “Who am I talking to? Are you a robot?” would fall into Catch-All in case you didn’t cover it in a script. The phrase “Well, you got me. Do you still want to continue our conversation?” ­– is a default reply in such scenarios. Think of a helping hand for a user that fell to Catch-All.

6. Think through Voice and Personality

Think of the talker that bores you to death. They are never any fun to talk to, a real buzzkill. The same may happen to a skill conversation. Alexa, Google Assistant, and others got their own Speech Toolkit with a wide set of male and female voices, different accents and audio effects to lighten up the talk. You know, you could use cough and snuffle if you want to.

But if you really want to impress a user in order to draw them into the conversation, you better tinker with grammatical stylings and work harder on the speech synthesis: get all the accents, pauses, and tones right. It’s meticulous work, but I can tell you, that’s worth the effort.

Another option here — a professional voiceover. Yes, it takes time, it’s expensive, and in case you want to add something new to the script, flexibility gets lost, but it’s effective. Your skill may speak in the voice of influencers, politicians, and movie characters.

We used recordings made by professional actors for one of our business projects. Only 0.5% of users could smell a bot — they didn’t realize it, it was just a guess, as they told us later

7. Use sounds and illustrations to create a special atmosphere

You can pick sounds you like in a sound library or you can create your own (remember that Alexa is quite demanding to the file extensions, converting is going to take some time). For instance, there’s a skill that simulates a friend or partner, right there beside you, enjoying a peaceful sleep by means of… snoring!

? VUI is not limited to flowcharts and dry texting. You’ve got all the resources to create cool and engaging skills

More hints from VUI designers

Don’t tell your users how to use your interface

Voice is a common and understandable interface, it is natural. So, don’t teach the user how to talk, they can do it.

Bad “To hear the message again say “Hear again”. To skip to the next message, say “Skip to the next message”.

Good “Do you want to hear the message again or we can skip to the next one?”

Ask yes-no questions

I recommend to avoid open-ended questions and statements, it’s better to move user toward action.

Bad “Hi! I’m Symphony, an audiophile bot. I’d love to recommend an album and tell you everything about it”.

Good “Hi! I’m Symphony, an audiophile bot. I’d love to recommend an album and tell you everything about it. You want to know something about a song of the day?”

Avoid bureaucratic language

It’s an obvious recommendation, but very few people follow it. No one wants to read a complex and overloaded text, hardly anyone would love to hear it.

Bad “It is crucially important to notice that following albums this brilliant singer recorded have become platinum, which makes it possible to draw a conclusion about the successfulness of the debut album as a means of an effective entrance to the international arena.”

Good “Debut album drew the world’s attention to the artist. No wonder his next records became platinum!”

Test it all the time

Test your skill in silence, at the crowded street, in a noisy room, speak with different intonations and speed. Even in the quietest place in the world things can go wrong. Someone may think the testing process is dull and boring, but trust me, it’s not. Every skill needs a real crush test!

From my personal experience. One day I’ve been developing fitness skill: Alexa conducts instructions, then the music turns on and a user has to repeat exercises. I’ve been testing everything enthusiastically by myself: I was jumping and running, I changed song length and repeated exercises like thousand times. Until skill’s UX was gratifying.

Dare and swear

Your users are provocateurs. I promise you, they will test your skill off the topic. Like, you’ve built a skill that helps to order a pizza, and some sneaky guy would ask it for sushi. Come up with a fitting reply for such cases.

One more thing: curse heartily ?I mean it! We at Just AI even got a vulgarity checklist, that is being used during the course of testing.

Listen and listen again

Pronounce everything you’ve come up with. Listen with ears everything your user would hear. Ask your friends and coworkers to read and even put it on an act. Record your speech and listen to the recordings several times, try things out.

Train your skill

But keep in mind, that you won’t provide for every eventuality at the first try. Just take this and adapt. Your skill will need more training, so read dialogs and analyze logs.

Designer tools

A piece of paper, a pencil or a whiteboard — there’s nothing better than this set to begin your work with. But there are other advanced tools that can make the life of a VUI designer so much easier:


This is a conversational flow chatbot builder fitted with the NLU engine. Over 10000 users and 1100 skills for voice assistants with a total audience of 1 million users.

It has a free trial period. You can use this builder to create a skill, test it, and integrate it into several channels. There’s a 24/7 tech support community in case you have any questions.


It’s a tool for quick prototyping. You can literally build up a dialog between a user and a system step-by-step using a flowchart, and then you test it using the prototype. It’s great for WoZ and quick hypothesis testing.


It’s an Alexa Skills graphical designer. Enables to design, prototype, and publish voice apps with no coding experience. Good for UX testing.


Graphical platform to create and manage chatbots. Suited for UX testing.

Educational resources

Voice Tech

Catalog site


Mentality and real user experience here differ from what we are used to in a web or mobile environment. And that’s the most interesting part — while the voice sphere is relatively new, this is the best place to experiment, invent, and to make great strides.

Voice UX 101: Certain Cases, Patterns, and Tools was originally published in Muzli – Design Inspiration on Medium, where people are continuing the conversation by highlighting and responding to this story.