This article is part of our series that explores the business of artificial intelligence
Earlier this week, Business Insider reported that Amazon is on pace to lose $10 billion from its Alexa voice assistant and its smart speaker business. The news comes at a time when Alexa’s competitors are dealing with difficulties of their own and struggling to find a way to monetize their voice assistants.
The current state of Alexa and other voice assistants is a reminder of the gap between creating an impressive technology and building a business around it. And as tech companies are busy laying off staff and preparing themselves for the incoming recession, cool-but-nonprofitable technologies will have to find a way to show their worth or be gutted to keep their organizations alive.
After eight years of Amazon Alexa, here’s what we can take away from the technology and business of voice assistants.
The technology is good, but not good enough
Many innovations enable voice assistants like Amazon Alexa to accomplish tasks that were impossible 10-15 years ago. Advances in automated speech recognition help the assistant to pick up and parse your voice under different background noise conditions, among other voices, and in different accents. Natural language processing systems powered by deep neural networks (transformers, RNNs, LSTMs, etc.) help assistants map different nuances of speech to their corresponding commands (think of all the different ways you can ask for the weather or request a timer). And there are a bunch of application platforms, APIs, etc. that allow voice assistants to traverse the ton of information on the web and to map voice commands to application functions.
However, there are also limits to what voice assistants can do today. For the most part, Amazon Alexa can perform simple tasks, such as setting timers, playing music, asking weather, and searching for simple information on the web.
These are tasks that are either very narrow and don’t have much room for making errors or are non-sensitive and don’t cause much damage if the assistant gets it wrong.
As soon as you want to perform tasks that are sensitive, require multiple interactions, or are multi-modal in nature, voice assistants become unreliable. For example, consider making purchases, one of the important use cases Amazon initially planned for Alexa. This is a sensitive task because it involves money, and users expect minimum errors. It is also a complicated task because it usually requires multiple steps, users want to see the product they’re buying, and they want to be able to browse suggestions and alternatives. It is very difficult to perform this with a voice-only interface. The same things can happen for other tasks such as scheduling meetings.
What do people pay for?
So, you created a very cool voice assistant that can perform a bunch of tasks with decent accuracy and others sub-optimally. How do you monetize it? In the context of current applications, there are several solutions.
The first solution is to sell hardware, such as the different variations of Amazon Echo or Apple HomePod or Google Nest smart speaker. In this case, the business value will be tied to the price of the device, the number of devices you sell, and how frequently customers will replace their devices. This is something that works for smartphones because people spend several hundred dollars every few years to get an upgraded iPhone or Pixel phone. But it doesn’t work for smart speakers. First, people aren’t willing to pay a very high premium for them because they don’t use them that often. Second, there is not much to upgrade in a smart speaker (it’s basically a microphone and a speaker, sometimes with a display). Therefore, there is no reason for customers to replace them regularly. Finally, upgrading and maintaining the cloud services that power the voice assistant have expenses. So basically, the continued use of smart speakers will pile up costs on the vendors of smart speakers, and eventually, those costs will exceed the profits made from selling the speakers.
A second solution is to sell services. In this case, users pay a monthly or yearly fee to have access to the voice assistant on their phones or smart speakers. In this case, your product must be valuable enough for users to be willing to pay for it. For this business model to be successful, your product must reach product/market fit by either addressing an unsolved problem or creating enough added value in a use case to convince users to pay for it. Unfortunately, the kinds of tasks that Amazon Alexa and other voice assistants fulfill aren’t valuable enough for users to pay for.
Finally, you might consider Amazon Alexa as a channel to attract users to other money-making products. For example, Amazon believed that Alexa would result in users shopping more online. But because of the limits that I previously mentioned, Alexa does not provide a very good shopping experience and users still prefer to use mobile or web apps to make their purchases.
Basically, this brings me back to what I said at the beginning. Amazon Alexa is certainly impressive when you look at it from a science and engineering perspective. But from a product and business perspective, it does not have the elements of monetization.
The next generation of voice assistants?
The first generation of voice assistants presented a good idea (using your voice as the interface to your computer) but was not able to create a profitable business model. We have seen this happen with VR headsets in the 1990s (the price was too high and the quality was too low) and with AR glasses in the early 2010s (the glasses did not provide enough added value to justify their price). Alexa and Siri are still alive because they were built by very wealthy companies that, under normal market circumstances, can afford to lose money on new products until they figure out the business model (or dispose of them).
What will be next for voice assistants? I see a few possible paths.
One solution is to wait for the AI technology to become so good that it can power fundamentally different applications (e.g., a voice assistant that is always one step ahead of you and reaches out to you instead of you invoking it?).
Another solution is to pivot from the current model of general-purpose voice assistants to more specialized assistants that can be adapted to different applications. This will enable voice assistants to be integrated into the context and workflows of applications and make them much more capable of handling complex, multi-step tasks. I think that this format can have a viable B2B business model, especially in industries where there is a lot of hands-on work (manufacturing, restaurants, hotels, etc.) and a voice assistant can improve efficiency and cut down costs. The added value for businesses will be much more than the convenience that consumers get from talking to their phones (just as the second generation of Google Glass found product/market fit in the hands-on work sector).
I’m not sure if display-less smart speakers will ultimately be the right form factor for voice assistants going forward. So many of our daily tasks involve visual elements, and the use cases of voice-only assistants will be limited. But it will be fun to see how it pans out.
Amazon has no plans to gut Alexa yet, but I think it’s probably time to move on to the next iteration of voice assistants.