Time magazine put out an exclusive that Amazon’s popular assistant Alexa will soon be able to identify people by voice. The magazine is making this seem like an Amazon breakthrough, but it’s far from it. In fact, this is the most underwhelming tech story that I can remember. It seems that the writer is not at all familiar with HUI technologies or she might have known that voice recognition (also called voice biometrics or speaker recognition) is quite old by tech standards. It was first patented by the Italian company CSELT in 1983.
The article makes mention of an incident where a 6-year-old girl tapped Alexa to order an expensive doll house and cookies as an example of why user identification is needed—an extremely obvious and expected use case.
Identifying users is beneficial to both the customers and Amazon. From the customer’s perspective, it allows control over purchases and potentially privacy or gatekeeping for some of Alexa’s native feature set or for “skills” (capabilities created by contributing developers). For Amazon, it resolves the unauthorized order issue (presumably it’s happened many times), increases overall security, and gives them even more granular user tracking. In contrast, the company’s web site or app can be shared between users (such as family members), so precisely identifying speakers and their actions/intents is the one element that is a big deal about this—for Amazon. For them, it will potentially facilitate more accurate user profiling and product targeting.
Nevertheless, this is a predictable feature that if anything, is late coming to the platform.
Astonishingly, the author implies that this up-and-coming feature will give Alexa an advantage over Google Assistant. Yeah, possibly for a moment-in-time until Google adds recognition and until every other assistant utilizes the technology too. Not because they are mimicking, but because it’s a logical capability that is essential to empowering an AA (artificial assistant) and will also (unfortunately) enable expanded user tracking.
During our stint spearheading an artificial assistant project called ALisA, Deb Benkler and I were creating a roadmap of technologies that would be implemented in ALisA over time. Early in the plan was voice recognition. We already had experience with it and knowing that it would play an important role in making an assistant more capable and (dare I say) more human-like, incorporating speaker recognition was a no-brainer. It was not about gathering user data, but about security, gatekeeping of certain functions, and a couple of other still proprietary procedures.
So the real take-away is this: there are a bunch of HUI technologies that are coming to voice assistants in the next 1-3 years (maybe sooner) as they all bridge the uncanny valley and move closer to being human-like and gaining trust.
Here are some to expect:
- Image (object) recognition
- Facial recognition
- Affective (emotional) recognition (face and voice)
- Gesture recognition
For the record, these technologies exist today, are very well developed right now, are reasonably inexpensive, and exist via multiple providers, big and small. Also, as these assistants gain better machine learning and AI competencies, they will display more predictive computing capabilities, especially when it comes to commerce related interactions.