The Rise of the Voice User Interface: An Alexa Skill Case Study

February 28, 2017 James Caple 0 Comment

My first introduction to talking machines was through the movie “Wargames” in 1983, in which NORAD’s War Operation Plan Response (WOPR) mainframe first talked to David Lightman (Matthew Broderick) and Jennifer Mack (Ally Sheedy) in Lightman’s bedroom as he tried to infiltrate a DoD system thinking it was just a game company. It would seem our collective consciousness has wanted machines to talk to us ever since. These days, an increasing number of Voice User Interfaces with our machines have been integrated into our day-to-day lives, from Microsoft’s Cortana to Apple’s Siri and Google Home to Amazon’s Alexa.

Dev Technology Group, Inc. has been working with Alexa Skill development to learn more about the creation of Voice User Interfaces for easier human-to-machine interactions and to learn more about working with AWS Lambda Microservices and DynamoDB. To this end, Dev Technology has developed a new Alexa Skill called U.S. Border Wait Times. Our U.S. Border Wait Times Alexa Skill integrates with the U.S. Customs and Border Protection (CBP) Border Wait Time service, (publicly accessible here), to provide hands-free retrieval of U.S. Border wait time information. Our Alexa Skill is now available in the Amazon Marketplace.

In developing this Alexa Skill, we learned just how difficult it can be to design a voice user interface between human and machine that is simple yet also facilitates machine understanding of the myriad of ways humans use their voices to communicate. Developing an Alexa Skill–especially one that will pass Amazon’s stringent Certification process–can be difficult. Here are some things we learned in going through the Alexa Skill Development and Certification process.

Natural Language Processing (NLP) Is Hard

NLP is the science of parsing human language so that machines can understand it. NLP stands at the intersection of computer science, artificial intelligence and computational linguistics. As of the first quarter of 2017, however, we still seem a long way from the intelligent WOPR machine Lightman could interact with through natural language interactions using a keyboard in the movie Wargames. But we seem to be getting closer. And in the case of Alexa Skill Development, thankfully Amazon is doing most of the heavy technical lifting for us in this regard. Nevertheless, it can still be difficult to understand exactly what the user is saying through the Alexa interface.

For example, in developing an Alexa Skill, you create an Interaction Model by defining possible phrases the user can use with your Skill, as well as data buckets (or custom slots) containing words and phrases that are meaningful to your Skill. In our U.S. Border Wait Times implementation, we defined custom slots to hold a finite list of border port and crossing names. Sometimes Alexa would pass values into our skill that were not defined in these custom slots. For example, our skill would sometime receive a value like “Evil Pass” when the user intended to say “Eagle Pass” or “Garage Falls” when the user really said “Niagara Falls.” Part of the problem is that humans have localized accents and a myriad of ways of saying things that Alexa has not been trained to understand yet.

Dealing with this level of complexity in a comparatively simple Alexa Skill was almost a show stopper during our development. One way of mitigating this problem was to put all possible border port and crossing names into arrays and to use the Levenshtein Distance Algorithm to help provide a better guess as to what Alexa heard from the user compared with the Custom Slot values acceptable in our Skill. Implementation of this algorithm in JavaScript greatly improved the performance of our Alexa Skill in terms of correctly “guessing” what the user is trying to ask for information about. The interpolated string value of “Evil Pass,” for example, will now receive a 70% score match with Border Port Name, “Eagle Pass,” so we will retrieve information for “Eagle Pass” on the user’s behalf. Of course, when the Levenshtein distance scores are too low, say below 55% for a request, we simply ask the user to repeat their request for a second try. All of this results in a much better user experience with the interface.

Natural Language Processing Is Magical

As difficult as NLP can be–as we have just demonstrated to one small degree–when it works, it seems magical. There’s something profound about being able to ask a machine to tell you information about real events that are taking place around the world, without having to use a keyboard or mouse to convey your request. The work being done at companies like Microsoft, Google, Apple and Amazon to facilitate NLP technologies like this, and to make them publicly available is quite amazing.

The AWS Platform Is Incredibly Powerful

Our U.S. Border Wait Times Alexa Skill is built and hosted entirely on the Amazon Web Services (AWS) Platform. The back-end of the Alexa Skill is written in NodeJS and Python and runs entirely as AWS Lambda Microservices. The backend data store is the AWS NoSQL DynamoDB database, and the implementation is monitored and instrumented with CloudWatch.  All of these AWS Platform Services make building secure and scalable backend infrastructure very easy and fun!

Alexa skills architecture

Practice Makes (Almost) Perfect

During our testing, we also learned how stark differences in localized vernacular need to be accounted for in natural language processing applications. For example, one of our Alexa Skill testers happens to be from Maine. One of the border ports we tested, Calais, is on the Maine/Canadian border. Alexa passes a wide variety of string values when the user asks for information pertaining to port Calais when pronounced “Calay:” for example, “Caillat,” “Kalleh”, and even “Cialis.”  To make matters worse, we discovered from our tester from Maine that in that part of the United States, they pronounce Calais as “Callous.”  Who knew? So, our Alexa Skill had to accommodate these more localized pronunciation patterns to correctly and efficiently retrieve data for the Calais Border Port and associated crossings. As our skill is more widely used, I am sure we will discover even more localized voice pattern idiosyncrasies.

Overall, Voice User Interfaces are an excellent new way of interacting with our machines and asking them to do things on our behalf. While NLP and Artificial Intelligence capabilities are not quite sufficient for fully understanding the seemingly infinite ways in which humans can communicate with their voices and writing, this exciting area of technology is certainly at a stage in which voice interactions with machines are truly beneficial and even fun.

Helpful Links:

U.S. Border Wait Times Alexa Skill in the Amazon Marketplace

U.S. Customs and Border Protection Border Wait Times on the web

Wargames, the movie

The Levenshtein distance algorithm

Alexa Skills Kit