Blog

The rise of the ubiquitous voice assistant

March 8, 2019

Voice Recognition Function On Mobile Phone — Image credit: Depositphotos

Apple, Amazon and Google have done a great job educating users that voice is here! Voice recognition has been getting better over the years but, finally, over the last few years, is good enough to be viable in various environments. Baidu, Google and Microsoft have taken great strides towards making AI-based speech detection and language translation better than humans. The copious amounts of user data required to train the algorithms are available to these platforms and they are getting better every day.

Voice is here. It’s easier. It’s faster. It’s better. It’s more natural.

Voice is the next paradigm for search

Amazon packaged Echo to be useful enough to be both a novelty and to drive this new paradigm. Alexa failures, in fact, have become party tricks—providing a lot of humorous entertainment. Amazon continues to add features and marketing Alexa heavily – for instance, this funny Alexa commercial during the Superbowl.

Google and Siri continued to push this trend forward as well – so “smart speakers” are now a thing as are smart switches and smart lights. People continue to purchase these devices, but the novelty aspect has worn off and several things have become apparent:

Smart speakers are not broadly useful—most users only use them for a few things—music, timers, alarms, and, in some cases, home automation scenarios.
Voice-only situations are limiting in cases where a visual or hybrid mode is required – media, entertainment, shopping, etc.
Engagement levels with third-party skills are very low (skills offer limited functionality and skills syntax is hard to remember). “I’m sorry, I don’t know that” and other Alexa failures are no longer entertaining, but rather frustrating. Even Amazon acknowledges this.
Smart speakers are NOT ubiquitous—the speaker on my kitchen counter is not in my car nor in my office!

Apps – do we need to reinvent the wheel?

Amazon does not have a mobile presence like Google (Android) and Apple (iOS)—so, they ignored, and continue to ignore, the need to have mobile apps incorporated into voice environments. App developers need to do extra work to plug into voice platforms, i.e., reinvent their services. Skills do very little compared to mobile apps. Developers see low engagement from these new “smart-speaker” search entry points.

As voice platforms realized that users continue to use mobile apps when they are on-the-go, they started to include mobile apps. Apple, Google, and Samsung have acknowledged this need. Bixby 1.0 introduced 3rd party app support – but it was a poor experience and they have not been able to scale it. At Google IO in mid-2018, Google finally announced its foray into including apps with Slices and App Actions. Their results are similar to Apple’s slow progress on Siri, despite the announcement of Siri Shortcuts. However, the ongoing challenge is that developers still need to do the heavy lifting, causing the coverage of useful mobile app actions to be very low in these platforms.

When Amazon announced 100+ million Alexa devices before CES in 2019, Google responded by announcing 1+ Billion devices with Google Assistant (on Android phones) and Apple put up a huge billboard at CES talking up privacy – “what happens on your iPhone stays on your iPhone”!

iphone privacy billboard CES — Image credit: Rajat Mukherjee

Amazon still does not have a mobile footprint and has not embraced the need for mobile apps to become first-class citizens in the voice world.

Millions of users are already using these powerful mobile apps on their phones on a daily basis. We don’t see this changing dramatically in the near future. The only way for voice assistants to be ubiquitous is to embrace the simplicity and trust that users have in their mobile apps. In fact, the IDC Consumer IoT Survey presented at CES 2019 shows that voice usage is almost twice as much on mobile devices (~70%) than on smart speakers (~36%).

IDC survey voice assistant usage — Image credit: IDC

There’s no need to reinvent the wheel. We should incorporate user needs into these systems in a practical, simple way. We need apps to be included broadly in these solutions.

As users, we don’t need to spend a lot of money on devices that do less than what we already have in our pockets and handbags—our smartphones—that are more powerful, more capable than ever before and provide visual data as well as voice.

Apps may change, smartphone form factors may change to blend into wearables, but the concept of ubiquitous connectivity to the online digital universe is here to stay. This is the most compelling voice entry point for users.

Simplicity, user choice, privacy and trust

Most people already have their banking, communications, social networking, navigation, travel and payment apps in their smartphones. They already know how to use them (simple). They already know which ones to use for what purpose—Slack for work, WhatsApp for friends, Messenger for family (user choice). They’ve already registered and set them up and they provide control over what information goes where—for instance, their portfolio may be in their banking app, their contacts are on the phone. They know which app sees what data (privacy). They also trust apps to protect them and their data.

Imagine a voice assistant platform that just allows users to use the apps they already use—on-the-go – anytime, anywhere—with simple voice commands, without having to register these service relationships again, and without waiting for the developers to have to reinvent the wheel to plug into the platform.

We must embrace mobile app actions as first-class citizens. We should be able to do things in our mobile apps with simple voice commands. We must provide user choice and personalize user experience without registration and without compromising privacy and trust.

Anywhere, everywhere, on-the-go. On the smartphone in your pocket or your handbag. Ubiquitous and free.

Moving beyond passive RAG: How to implement active memory reconstruction for…

How self-improving harnesses are rewriting the agent engineering playbook

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

The rise of the ubiquitous voice assistant

Voice is the next paradigm for search

Apps – do we need to reinvent the wheel?

Simplicity, user choice, privacy and trust

Like this:

Leave a ReplyCancel reply

Voice is the next paradigm for search

Apps – do we need to reinvent the wheel?

Simplicity, user choice, privacy and trust

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks