We’re in the middle of another developer conference season. And like last year, artificial intelligence has been of the main highlights of the Google I/O, Microsoft Build and Facebook F8 (Apple’s WWDC is a few weeks away, and expect more AI announcements there as well).
After causing excitement and panic with its Duplex technology last year, Google proved again this year that it still holds the edge over its competitors when it comes to stealing the show with its AI innovations. Amidst its many announcements, I found Google’s ability to perform on-device machine learning to be the most important.
Apparently, Google thinks so too, because Sundar Pichai made the announcement himself, describing the complexity of turning speech to text and how his company has managed to leverage advances in deep learning to bring the power of Google’s AI to the device that fits in your pocket.
With artificial intelligence becoming more and more important in the many tasks we perform every day, an increasing number of companies are becoming interested in moving AI tasks from the cloud to the edge. And when edge AI picks up steam, it’ll pave the way for much more than real-time speech processing by your digital assistant.
The challenges of running AI at the edge
Pichai eloquently described the challenges of speech recognition AI and natural language processing in his I/O 2019 keynote:
“In order to process speech today, we rely on complex algorithms that include multiple machine learning models. One model maps incoming sound bytes into phonetic units. Another one takes and assembles these phonetic units into words. And then a third model predicts the likelihood of these words in a sequence.”
The AI algorithms that perform these tasks require huge amounts of data and computer power, which is traditionally beyond the capacity of mobile devices and personal computers. That’s why AI assistants and smart speakers need a connection to the cloud to process voice commands.
According to Pichai, the AI models that run Google’s speech recognition algorithms were over 100 gigabytes in size. He likens running the machine learning algorithms on mobile devices to “putting the power of a Google data center in your pocket, an incredibly challenging computer science problem.”
But recent advances in AI research have enabled the company to shrink the size of the AI models to half a gigabyte, small enough to fit them in smartphones.
Why is this important?
On-device AI reduces latency
Latency has been one of the main challenges of cloud-based AI. Every time you want to use AI to process an audio file or image or video, you must upload the data to the cloud where the deep learning models reside.
The round-trip causes a delay that can be anywhere between annoying to game-breaking. For the Google Assistant, the delay can make the experience clunky and slow, forcing users to wait a few seconds every time they utter a command to the AI assistant.
With Google’s on-device machine learning technology, the Assistant becomes ten times faster, and users can give consecutive commands to it in real-time. Google displayed the power and speed of on-device machine learning in a well-prepared demo. As Scott Huffman, VP of Engineering at Google Assistant, described, the AI’s on-device voice-processing was so fast that “tapping to operate your phone would almost seem slow.”
(On a side note, it would be interesting to see if the AI performs just as well in real-world scenarios as it does in the canned demo. We’ll have to wait some more for that.)
Google also showed how the real-time voice processing AI could streamline the process of multi-tasking across apps such as looking up information, photos and more while chatting or composing emails. The real-time voice recognition can also help do real-time transcription of videos playing on your phone.
In the case of Google Assistant, reducing delays will improve the user experience and create the possibility for new ways to use your phone. In other settings, such as robot surgeons, autonomous rescue drones and self-driving cars, zero-delay access to AI processing can save lives and prevent fatal accidents.
Improving the privacy of AI applications
Privacy concerns have among the biggest challenges of deep learning applications. Companies such as Google and Amazon need access to user data to train their deep learning models.
Every time you utter a voice command to the Google Assistant or Amazon Alexa, your voice is recorded and sent to the cloud. Every email you compose in Gmail is being processed by the AI algorithms in Google’s cloud. The same goes for pictures you upload in Google Photos and Facebook (even if you’re not sharing them publicly.
To be clear, storing data in the cloud per se is not a privacy problem. End-to-end encryption can ensure that only you have access to your data, even when you’re storing them in the cloud. But AI companies can’t use these privacy-enhancing technologies because it would deprive them of the vital food for their data-hungry deep learning algorithms.
Big tech companies have a poor track record of handling the sensitive data of their customers. The past few years have seen countless privacy blunders that range from recording private conversations to sharing data with employees without users’ consent. The bottom line is, you don’t really know what happens to your data when you give it up for AI processing.
On-device AI can make sure that your data stays on your device and doesn’t get uploaded to some cloud server. This can be very important for settings such as AI-powered home security cameras that constantly stream video from your private quarters to the cloud, and smart speakers that are recording audio from your home.
Beyond users, on-device machine learning can also give a huge benefit to tech companies that want to offer AI smarts to their users while also overcoming the mounting challenge of staying compliant with privacy rules such as EU’s GDPR and California’s CCPA.
Storing nothing in your servers is still the best way to make sure you don’t run afoul of regulations that will penalize you if you don’t protect user data against security threats.
Reducing power consumption of AI models
Power consumption has become a major problem for artificial intelligence algorithms. The resources that need to be allocated for deep learning models running in the cloud require huge amounts of electricity, creating a dual challenge of costs and carbon footprint.
Moreover, requirement of constantly transferring data from edge devices to cloud where the AI brains reside adds an additional electricity overhead and additional costs for networking and data transfer.
“The physics of moving data around just seems to require a lot of energy. There seems to be a rule that the energy an operation takes is proportional to how far you have to send the bits,” writes machine learning expert Pete Warden in a post that explores the benefits of fitting AI models into tiny edge computers. “CPUs and sensors send bits a few millimeters, and is cheap, radio sends them meters or more and is expensive. I don’t see this relationship fundamentally changing, even as technology improves overall. In fact, I expect the relative gap between the cost of compute and radio to get even wider, because I see more opportunities to reduce computing power usage.”
What this means is that, from a power consumption perspective, performing AI tasks at the edge will become more and more efficient as companies continue to develop specialized hardware for running deep learning algorithms. In fact, Xnor, a company we interviewed earlier this year, has developed a prototype AI accelerator that can run neural networks for several years with just a cell battery and a solar power unit.
Performing AI tasks in offline settings
Few things are as frustrating as an application that stops working when you lose your connection to the cloud. Google’s on-device technology can ensure that you can use the Assistant to navigate your apps regardless even when you don’t have an internet connection (of course, apps that require internet connectivity will not work if you’re offline).
In other settings, offline access to AI models can be critical. This is especially important for AI-powered rescue drones, which often have to operate in areas where there’s no internet connectivity.
Let’s not get too excited about Google’s on-device AI—yet
While I’m very intrigued by Google’s on-device machine learning capabilities, I still see some challenges ahead that need to be figured out.
First, I’m interested to see if Google is really sincere about not wanting to collect voice commands. Google has clearly shifted toward showing more respect for user privacy in the past couple of years. But let’s not forget that the company has build its economic empire on collecting and mining user data. It’s hard to see how the company plans to keep its edge in the fast-evolving domain of voice user interface without hoarding data—especially as its competitors show no relent in their practices.
Second, even though I mentioned power consumption as one of the pros of edge AI, I’m curious to see how Google’s on-device machine learning capabilities will affect the battery life of devices.
Finally, real-time AI has some direct consequences on user experience and expectations. For instance, when you’re giving five to ten voice commands to Google Assistant every minute, you would expect it to be smarter than an AI assistant you summon a few times per day. At the very least, it would be super-frustrating if you had to say “Hey Google” every time.
Google has made the experience smoother by adjusting the AI assistant to answer to continuous commands without requiring the wake word. As the demo shows, Google’s AI team has worked hard to make sure the Assistant can resolve references in sequences. For instance, when the presenter asks Google “What’s the weather today?” followed by “How about tomorrow?” the AI assistant figures out that the second question is a reference to the previous one. In another part of the demo, the presenter says, “Turn the flashlight on. Turn it off.” Again, Google Assistant knows that it in the second sentence is a reference to “flashlight.”
However, resolving references can become challenging in real-life situations. We humans can carry out more than one conversation simultaneously by changing our tone or the direction of our gaze. In many cases, we make eye contact to determine who we are addressing.
However, AI assistants have none of these capabilities. Neither do they have proper understanding of the context of human language. Calling the wake word is how they can make sure (in most cases) they don’t confuse idle conversation for a command.
How will Google Assistant react if you suddenly break your string of commands and interact with someone else nearby? How can you tell if you’ve paused for too long and need to say “Hey Google” again before interacting with the Assistant? How smooth will be the experience of guiding the assistant out of a mistaken command? Is Google trekking too deep into the dangerous domain of anthropomorphizing AI technology?
We will have to wait to Google releases its “next generation” AI assistant before we get to answer these questions and many others.