What we learned from a decade of Siri, Alexa and Google Assistant

By Ramu Sunkara

voice assistants timeline

Over the past decade, we all have witnessed a great rise and significant achievements in the voice AI space. Today, conversational AI is a new paradigm, and the euphoria around it refuses to settle down.

If we take a look at the business, voice prospects are bright. The global voice assistant market is projected to reach $30.74 billion by 2030, registering a CAGR of 31.2% during the forecast period 2020–2030. On the customers’ side, the forecast is also promising: 142 million people in the US, or 42.1% of the population, are expected to use voice assistants in 2023.

Surveys show, however, that up to a quarter of new voice device owners quit using voice assistants in the second week, and a large portion of customers use voice assistants for a limited number of simple tasks like choosing music or asking for time. “I am not apt to use voice assistants for things that have consequences”, says Brian Glick, founder of Philadelphia-based software company Chain.io.

As voice assistants are struggling to understand the speaker’s intent, annoyed users are not the only affected party. Business is impacted as well since unreliable voice assistants damage the reputation of companies implementing Google actions or Alexa skills for their apps. 

What are today’s most popular voice assistants—like Siri, Alexa, and Google Assistant—lacking? There are several challenges at the core of what makes an effective conversational experience.

#1: ROI – the business imperative

For corporate decision-makers, ROI is a primary focus. Conversational AI brought to the company ecosystem should add value to the customer experience, reduce operational costs and generate revenue for the brand – just like traditional GUI interfaces for web and mobile platforms do. 

In reality, many enterprises are still hesitant to implement the voice strategy for the fear they will not be able to prove their ROI, justify their spendings on the latest conversational advancements and start reaping the benefits from voice assistants over a short-term horizon.

#2: Multimodal approach

Many conversational AI solutions today follow a unimodal, or voice-only approach, enabling users to access their functionality only through human speech. Voice alone may be a solution for straightforward scenarios. In most use cases, however, a conversational interface should not replace but augment and enhance the existing application GUI. 

The human brain strongly relies on visual communication, consuming 80% of information with what we see and only 20% with what we hear. In a similar manner, voice solutions should leverage the existing visual interfaces of software products—but they don’t, mostly leaving users with voice-only responses.

Blending voice with visuals creates a multimodal UX supporting customers’ ability to choose an interaction mode based on their preference, context and needs—voice, type, and touch, or a mixture of both.

A multimodal interface offers a more transparent, robust and efficient model of interaction between applications and humans. Here, the two modalities supplement each other for a more intelligent and efficient customer experience. Voice removes the current GUI limitations—instead of fumbling between screens looking for a required option, users can trigger actions with voice commands at any time and from any place. The application GUI, in turn, provides context for the voice assistant. Voice works in sync with visuals, so users are always aware of what is happening in the application at the moment of speaking. 

For customers’ convenience, a voice-enabled solution should provide the possibility to use voice and existing GUI interchangeably. When in a public setting, GUI is a preferred interaction mode, but when the user’s hands are busy, voice is a must.

#3: Platform-agnostic design

One of the major drawbacks of voice assistants is technology lock-in. Siri, Google Assistant, Bixby and Alexa are vendor-, device-, and OS-specific, lacking cross-platform support. Only for smart speakers. Only on iOS or Android devices. Nothing for users on the Web.


Enterprises that choose to implement voice solutions are either stuck on a single platform or need to leverage a plethora of tooling, sinking a lot of time, effort and money into the development process. Furthermore, this poses a problem of a fragmented and scattered voice AI ecosystem hard to support and maintain.

#4: Developer-friendly toolkit

And the last question is, who will bring voice to software products? As a general rule, these are the developers who built the product, and they need a low-code conversational AI platform with cross-platform support on web and mobile to seamlessly integrate voice to their workflows.

Finding the right dev tools and APIs and assembling all the pieces is probably the most challenging task. With so many moving parts—SST, TTS, intelligent tagging, named entity recognition and custom wake word support—and so many targets to run on, things can quickly get out of hand.

An all-in-one conversational AI ecosystem with cross-platform support can make life easier and smoother for everyone involved. Developers, who can create a virtual assistant once and embed it into applications built with different frameworks and programming languages. And enterprises that can deliver a conversational experience to users on mobile devices, both iOS- and Android-based, or in the web and desktop versions of their applications. 

AlanAI platform

Building a voice interface may seem tough at first—but not when you have the right tools. 

Conversational platforms by Apple, Google, Samsung and Amazon have definitely made a huge impact on the tech industry. However, over the last decade, voice has not become the mainstream, and most of the web and mobile applications still rely on traditional GUI, rather than voice experiences. 

Tapping into the vast potential of voice will rely on bringing the right elements together, which include the following:

  • Guarantee quick ROI from voice-powered solutions
  • Provide a consistent user experience for all versions of the application, on iOS, Android and Web
  • Integrate with existing GUI and provide a rich multi-modal user experience that is much closer to real-world interactions 
  • Be easy to use for developers to deploy and iterate voice experiences fast

Conversations are inherent to us as human beings, and that is the reason why a voice interface is the future for every application. With a small caveat, of course—it should co-exist with the traditional GUI we all got used to in the previous decades.

About the author

Ramu Sunkara

Ramu Sunkara is the CEO and Co-Founder of Alan AI, Inc.

Alan® is the technology leader for virtual assistant deployment enabling organizations to swiftly build and roll out conversational experiences for their apps and websites. Alan® ensures that organizations and their users are afforded the highest level of privacy, security, and data ownership within their own branded environments.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.