Voice User Interface (VUI) is very much in vogue so we decided to pick up some developer tools to see how easy or hard it is to design and prototype experiences on Amazon’s Echo – one of the market's leading devices.
Before diving into exactly what we achieved let’s take a look at the device itself. In a nutshell, the Amazon Echo is a 360-degree omni-directional speaker, powered by a cloud-based virtual assistant called “Alexa”. The Echo itself consists of a cylinder speaker with a seven-piece microphone array and, when you use a “wake word”, Alexa begins to monitor your speech.
Potentially, this smart speaker lets you order dinner, check the weather, turn on the lights, adjust your thermostat, call a cab, play radio stations, and check your bank balance, using only your voice.
Alexa also provides developers and designers with built-in capabilities, referred to as ‘skills’. The ‘Alexa Skills Kit’ lets you teach Alexa new skills. Users can access these new abilities by asking Alexa questions or making requests. You can build skills that provide users with many different types of abilities giving you the scope to imagine and build new user experiences.
- Build a practical understanding of the product and service design potential of Echo
- Prototype a new product which would be useful for Foolproof folk
Initially, we puzzled over how the day-to-day life of staff in the Foolproof offices could be improved. If you think about it this is a case of practising what we preach; our team are our users and generating the best possible experience for them is key.
We discovered that one longstanding issue facing Foolproofers travelling between the London, Norwich and Singapore offices was the lack of “local” knowledge of the area that they find themselves spending time within. For example, team members traveling to the London office, and spending the night away from home, were finding themselves ‘at a loose end’ in the evening.
As a result, we identified an opportunity to deploy Alexa to help staff members find places to go, and things to do, within the locality of the Foolproof London office. This led us to design and develop our own Alexa skill: “Outpost”
What is Outpost?
A Google Maps enabled office concierge interface based upon a hyperlocal recommendation system tailored to the office needs and contexts provided to us by Foolproofers during the research process. Outpost offers recommendations and information including - locations, distance and descriptions. Basically, it’s a pretty nifty responsive mapping system which considers users requests in the context of the local area…
We interviewed people who often spend time away from their home office to better understand their situation. It became apparent that team members based in the London office had a low awareness of what was around them. They often found themselves going to the same places for lunch or a coffee, while wanting to try something new but lacking a convenient and reliable way to find good places nearby.
After conducting our initial research, we opted to chair a focus group to gain enhanced insights. We invited members of Foolproof to represent their individual circumstances and what for them made a good venue, combined with their personal usage of VUIs, all with a mind to informing the product's developmental process.
Having conducted the focus group we found that only a handful of people were using voice assistants regularly on their smartphones. However, several had begun using an Amazon Echo in their homes. This trend indicates that users were still self-conscious of speaking to VUI interfaces in public spaces, which meant our own application Outpost was met with a design challenge regarding encouraging usage in a busy office environment.
How we made Outpost
From the insight, we were able to define the criteria of what made a good venue recommendation for each situation. We were then in a position to start defining the ways in which users would ask Alexa for recommendations and how they expected her to respond.
This led into our own design approach, our first port of call was designing an effective voice user interface (VUI). When carrying out the design process it was important to look at:
- Getting information from the user
- Presenting information to the user
- Using text-to-speech effectively
- Handling dialogue errors
We began the process by creating a flow diagram that mapped out how users interact with the skill. This flow diagram encompassed the requests users could make and the possible outcomes of those requests. This diagram was then used when designing the detailed elements of Outpost’s interface.
Finally, a set of sample utterances that mapped our user’s intents was generated. These denote the phrases users say when interacting with your skill – there’s a lot of values to enter but the input is well worth it to maximise overall usability.
With the bedrock of the application coded we moved onto programming the particular model for creating Alexa skills, which offers quite a bit of flexibility. We began by defining an invocation word that users will use to invoke and interact with the skill – in Foolproof’s case ‘Outpost’. We chose Outpost because it met the criteria, was memorable and easily identifiable by Alexa, all of which contribute towards ensuring a positive user experience. In other words, this means that the service comes to life when users utter the invocation name, for example, “Alexa, ask Outpost where I can go for a client lunch".
The next step on our VUI design journey was to script utterances and intents. Utterances are recognised phrases that form part of a sentence that a user might say. Alexa tries to match a whole sentence a user says with one utterance, and each utterance maps to an intent. An intent represents what information the user is trying to get from the skill.
Once Alexa determines the intent from what the user has said, it will look for keywords in the sentence, or “slot values”. Those are parameters that an intent expects in order to process the user request. When you define utterances, you add slots to those sentences, so when a user sentence is matched to an utterance, the slot values will also be matched. Those values are accessed in the server-side code to do the computations, resulting in the appropriate response from the device.
What we learnt
We found the way users interacted with Outpost surprising. The difference from user to user was striking and often unlike the scenarios we had planned for, which alerted us to our own speech bias. Despite running a focus group to identify the countless ways users interact with Alexa, uptake of the application we had created was still relatively low. This mirrored our own research findings regarding people’s willingness to engage VUIs in public spaces.
Programming a fully functional application via the Amazon Echo VUI was challenging, and despite creating a functional application, it is unlikely that we’ll ever release a finished product. This is due to the extensive need for time investment from developers, and UX designers, coupled with the (admittedly) marginal value that we came to realise the product could offer Foolproofers.
Despite learning a lot from the project as a whole, on reflection, our own experience highlighted the need for clarity regarding exactly what value the user will gain from using your VUI. Balancing that against the time and money that would need to be invested in product development is key.
VUI product ideas are easy to generate on a whiteboard but darned tricky and time consuming to make a reality. Having conducted our own research and development process we can see the value of creating VUI applications, but the need to think carefully about whether they create enough value for your customers to warrant substantial investment is notable.