As with all its events, Google made a big splash at “Made by Google 2018” conference it held earlier this week, in which the company revealed its new flagship Pixel 3 phones as well as other consumer hardware and accessories. And as with its previous events, this one was accompanied by a lot of unwarranted hype surrounding artificial intelligence, the domain that has become one of the main battlegrounds between large tech companies.
One of the highlights of the event was Google “Call Screen” feature, which will be available in all Pixel phones very soon. The feature, as declared by Google’s presenters, enables your phone to answer calls in your place when you can’t or don’t want to pick up.
To be true, it’s a neat feature, and it can get rid of noisome spam and telemarketer calls. But the way Call Screen was introduced and reported involved a lot of exaggeration regarding the AI capabilities that it sports.
“What’s interesting about Google’s approach to identifying and blocking unwanted calls is that it’s leverage artificial intelligence (AI) to fight this ongoing battle,” reads the coverage by HotHardware under a headline that reads “Google Pixel 3 Duplex And AI Call Screening Deliver Knock-Out Punch To Telemarketers.”
USA Today’s Edward Baig described Call Screen as “The AI feature I’m most jazzed about.”
Other publications used similar headlines and described Call Screen as some sort of a smart agent that uses sophisticated AI to engage in conversations with its interlocutor. But after looking at the different demonstrations of Call Screen, I would rather describe it as an advanced and useful answering machine that sports some not-too-sophisticated and well-known artificial intelligence capabilities to solve a real problem.
Unpacking the AI behind Google Call Screen
First off, during the presentation, Call Screen was cleverly bundled with Google’s Duplex technology, which in all fairness does use some advanced deep learning techniques to synthesize natural voice and engage in conversations powered by natural language processing and generation (NLP/NLG). When introduced in May, Duplex caused a lot of controversy surrounding the ethics of AI-generated conversation and gave the impression that the technology is so advanced that it will be able to carry on open-ended conversations without its human interlocutors finding out they are speaking to an AI agent. (Reality check: It’s not. It has the same limits that all deep learning systems suffer from.)
“We’re going even further,” says Liza Ma, product manager at Google, before presenting Call Screen, which implicitly means that what comes after is either on par or more advanced than Duplex’s AI capabilities.
“Just tap the screen call button, and your phone will answer for you and ask who’s calling and why,” Ma describes further in the presentation. Again, the feature is described in a way that personifies the system and gives the impression that your phone is engaging in an active and smart conversation. I don’t see the point in doing this when the same can be achieved with a pre-recorded or synthesized voice message.
“The conversation is transcribed in real-time on your screen,” Ma continues. This is perhaps the most advanced “AI feature” that Call Screen sports. The technology is well-known: It’s called “speech-to-text” and transforms human speech into written text.
Speech-to-text or automated transcription has existed for a long time, but before advances in machine learning and deep learning, it was clunky at best and not suited for consumer applications and certainly not for real time use. Deep learning algorithms and neural networks make it possible to transcribe audio files in near-real time, given they have been trained with enough quality data.
This happens to be one of Google’s strengths. The company has billions of hours’ worth of transcribed YouTube videos and processes hundreds of millions of voice search queries every day. So if there’s any company that’s positioned to master real time transcription, it’s Google.
At this point, Call Screen gives the user several options. To pick up the call, mark it as spam or ask for more information. But all the options and automated responses are generic. For instance, the “Tell me more” and “Mark as spam” commands could apply to any conversation.
“This all happens on-device, powered by Google’s AI,” Ma concludes. Again, there’s a catch here. The on-device part is truly interesting, perhaps the most interesting part of the technology. But the “Google’s AI” part deserves some reflection.
On-device voice transcription (which most media outlets ignored in their coverage of Call Screen) is significant, because it shows advances in edge AI, or better put, processors that can run neural networks without the need for a connection to the cloud. Apple uses a similar technology for FaceID, the facial authentication technology on its iPhone X devices.
This means that Google won’t need to send the caller’s voice to the cloud to transform it into text. I’m interested to see how the technology performs with different accents and different background noises. I suppose since Call Screen will mostly be used to answer to telemarketers, the environment is predictable: The caller will be someone who speaks in a clear accent and is sitting in a room that doesn’t have much background noise, so Call Screen’s neural network shouldn’t run into trouble. As for other spam callers, I don’t think the user really cares about the quality of the transcription, because they will probably be dismissing it anyway.
The on-device pitch also serves as marketing for the company, which is often accused of aggressively collecting user data.
The value of Call Screen isn’t in AI
However, “powered by Google’s AI” was the problematic part of Ma’s pitch. As far as we all know, Google’s AI power lies in its TensorFlow Processing Units, its Google Brain technology, and the countless other technologies that run in the cloud, not on the edge, which is a vague claim since Call Screen runs on the device. I think a more accurate phrase would be “powered by Pixel 3’s Snapdragon 845 processor and a neural network developed by Google’s engineers.” Or preferably, they could’ve omitted saying it at all.
Saying something is “powered by AI” without being specific about what technology it uses creates unwarranted hype and expectations. Yes, AI sells these days, but it doesn’t mean that you must tack it on every single product you introduce.
As I discussed earlier this week, the term AI doesn’t have a clear definition. It’s a moving target that changes with time and technological progress. It will always sound magical and beyond grasp compared to today’s technique. Real-time voice transcription, however, is a very distinct and well-known technology. We know its capabilities and limits. It might be cool today but sound mundane in a few years, when every edge device with a specialized ASIC processor will be able to perform it.
So does Google’s Call Screen use AI?
Compared to Google Lens, Duplex, and all the machine learning–powered features of Pixel 3’s camera application, Call Screen’s voice transcription capabilities is pretty dumb and dull. In fact, if you stick to the general definition of artificial intelligence, which implies a sort of smart decision-making that would’ve otherwise required human intelligence, I don’t even think Pixel’s Call Screen qualifies as AI.
But this doesn’t mean Call Screen is useless. Far from it. In my opinion, the question of whether Call Screen uses AI is irrelevant.
In fact, Pixel 3’s Call Screen feature is a very useful feature and if I had a Pixel phone, I would’ve probably used it more frequently than all the other high-tech feature Google introduced at the event. That is because to me, spam calls are much more of a problem than automated restaurant booking and focusing on a puppy as it moves around in the video.
Call Screen proves that you don’t necessarily need to employ the most advanced technology to build good software. Bottom line: The value of an application is in the problem it solves, not the technology it employs.
Image source: YouTube