Protecting AI models against audio adversarial attacks

Music waves

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Imagine the following scenario: A friend (at least you think it’s a friend) sends you an email with link to a YouTube song. But as you click on the link and enjoy the music, the song sends hidden commands to your AI-powered gadgets and home appliances and orders them to do things such as opening doors and disabling alarms.

This is not a scene from an AI apocalypse flick. It can happen in real life. Like today.

Thanks to advances in artificial intelligence, computers have become increasingly effective at understanding and processing voice commands. Voice-based AI assistants such as Amazon’s Alexa, Apple’s Siri and Microsoft’s Cortana have become a fact of everyday life.

But while AI makes life easier, it also introduces its own set of vulnerabilities and security threats. This has become the focus of many researchers and organizations who want to make sure that the next generation of AI technologies that will power our computing devices don’t become the target of cyberattacks.

In an interview with TechTalks, Pin-Yu Chen, AI researcher at IBM, discussed the unique threats against machine learning–powered speech recognition systems and ways to create robust models that can withstand hidden adversarial attacks. Along with researchers from University of California, Berkley, University of Illinois, and Shanghai Jiao Tong University, Chen has compiled the findings in a paper titled “Characterizing Audio Adversarial Examples Using Temporal Dependency,” and will be presenting it next week at the seventh International Conference on Learning Representations (ICLR 2019).

Audio adversarial attacks

The scenario described at the beginning of the article is called an “adversarial example.” The goal of adversarial examples (aka adversarial attacks) is to manipulate the behavior of AI models while remaining unnoticed to humans.

Machine learning and deep learning algorithms, currently the most popular type of artificial intelligence technology, are especially vulnerable to adversarial attacks, because develop their behavior by examining large sets of data and creating mathematical representations of the patterns and correlations they find between similar examples.

When you provide a machine learning model with an input, it will return one or several confidence scores that represent its statistical similarity to the different classes of data the AI model has been trained on.

For instance, when you feed a photo to an image classifier AI, it will tell you that there’s a 95-percent chance that it contains a human face. Other AI algorithms might give more granular information, such as the emotion of on the subject’s face. In the same manner, speech recognition algorithms map audio wave forms to text.

Adversarial examples make subtle changes to the input of a machine learning model in a way that causes its output to change. For instance, adding a layer of noise to the image of a panda will cause an AI to classify it as a gibbon. But to a human, it will still look the same.

artificial intelligence adversarial example panda
Image source: Medium

Audio-based AI systems are equally vulnerable to adversarial examples.

Researchers have shown that it’s possible to create audio that sounds normal to humans, but AI models like automated speech recognition systems (ASR) will pick them up as commands like opening a door or going to a malicious website.

“In audio adversarial examples, humans basically hear the same thing, but the AI might pick up some hidden command if the music file has adversarial noise,” Chen says.

Like their visual counterparts, audio adversarial examples are created by adding a layer of noise to the input data. The changes must be subtle enough to be inaudible to human ears while at the same time significant enough to the statistical nature of the input enough to create a different output when processed by the AI.

“Adversarial examples piggyback the hidden voice command on a regular audio file like a YouTube song. For a human, it’s like you’re hearing a Taylor Swift song. But for the audio AI model, it will hear something else and will execute a stealthy command without catching the user’s attention,” Chen says.

Old methods to protect AI models against adversarial examples

Wireframed blue robot hand touching digital padlock on dark background 3D rendering
Image credit: Depositphotos

Adversarial attacks against computer vision AI systems such as image classifiers, facial recognition systems and emotion detectors, have been well studied. Naturally, this is due to the broader scope of sensitive tasks that computer vision algorithms are involved in, including driving autonomous vehicles, diagnosing diseases and identifying faces.

Earlier this year, Chen talked to TechTalks about two techniques developed by IBM Research that help identify and patch image-based adversarial vulnerabilities in AI models. Several other researchers have led similar efforts.

Unfortunately, there still hasn’t been such focus on audio adversarial attacks. Chen and his colleagues tried to apply lessons learned from the image domain to audio AI models.

“Some of the defenses that have been developed already show promise on image-based AI models. We wanted to see if we could transfer the experience to the audio domain and be effective on audio adversarial attacks as well,” Chen says.

The team tried some of the known and tested defenses such as compression and down-sampling, which have proven to be effective in the visual domain. Running the input data through a compression algorithm before passing it over to the AI model can filter out or disrupt the adversarial noise. But when it came to audio AI models, the same methods were not effective enough.

“These kind of defenses are known as input transformation. This works well in images. You can do JPEG compression, resizing, or reduce the bits for images. But for audio, as long as the attacker knows you’re doing input transformation, it’s very easy to develop an advanced attack to bypass the transformation,” Chen says. “The lesson learned from images do not really transfer to the audio domain and that’s why we need to think about new ways to defend against audio adversarial attacks.”

Leveraging temporal dependency to detect adversarial examples

Artificial intelligence algorithms require different architectures and models to process different data types. For instance, image-based AI models are usually based on convolutional neural networks (CNN), which are more suitable for capturing spatial consistency.

But speech and audio have temporal dependency, which means processing each bit of data is closely dependent on the sounds that come before and after it. That’s why AI engineers usually use recurrent neural networks (RNN) for speech recognition and transcription. RNNs excel at capturing temporal and sequential neighborhood information.

Chen and his colleagues tried to leverage this unique nature of audio AI models to develop their defense method.

“We know that voice and text have a strong temporal dependency in input space. And that’s how it’s being used in automatic recognition and RNNs. The question we wanted to answer was, if these AI models leverage the specific properties of their domain to perform their tasks, can we also leverage these same properties to build more robust models that are resilient to adversarial attacks?” Chen says.

The AI researchers discovered that they could use temporal dependency as a metric to tell the difference between normal audio and adversarial examples.

Interestingly, the researchers also observed that the temporal dependency nature of audio data also plays an important role in developing adversarial examples. “What we’ve observed is that if you want to manipulate how an AI transcribes an audio input, it seems that you have to modify the entire sequence. This is how machine learning models work. It has to take into account the voice before and after every part,” Chen says.

The next turned this to their advantage.

A simple but effective way to protect AI models

To detect whether an audio sample is legitimate or contains a hidden adversarial attack, Chen and his colleagues first run the data in its entirety through the voice recognition AI and register the result. They then split the input into two or more parts and then run each of those parts separately through the AI again.

“If the input audio is legitimate, the transcribed sentence in the second pass will be like a chopped off version of the complete audio file. So if you take the first half of the audio, then the output should be the first half of the original output,” Chen says.

But if the audio sample has been manipulated with adversarial noise, there will be a large difference between the results of the entire sequence and its chopped-up parts. “Because the goal of adversarial examples is to manipulate the output of the AI model, if we disrupt the structure of the audio input by considering only a part of it, then the transcribed results will be very different from the manipulated output,” Chen says.

temporal dependency defense artificial intelligence audio adversarial examples

In their paper, the researchers share several examples of how splitting audio input and running it through AI models can reveal their benign or adversarial nature.

temporal dependency defense artificial intelligence audio adversarial tableWhile the core idea behind the temporal dependency defense method is very simple, it is also very effective at protecting AI models against audio adversarial attacks.

The researchers tested temporal dependency detection against different methods of creating audio adversarial examples. This include adversarial examples created with genetic algorithms, optimization-based approaches, and over-the-air attacks (described at the beginning of this article).

“Our results show that temporal dependency does a good job of protecting against all these types of attacks,” Chen says. “We also tried to break our own defense by creating an advanced attack that knows we’re using temporal dependency. But the best attacks we could think of could not break the attack.”

According to Chen, temporal dependency can protect against both black-box attacks (where attackers have no knowledge of the underlying structure of the AI model they’re targeting) and white-box attacks (where the attacker has full knowledge of the architecture of the targeted model).

“The temporal dependency technique is designed to defend against white-box attacks. So it means that it’s also resilient against other, weaker types of black-box attacks,” Chen says.

Understanding the scope of the threat to AI systems

Insider threat

Adversarial attacks against AI systems are a serious threat. But they’re also very hard to produce both in the image and audio space. For instance, for an adversarial attack such as the one mentioned at the beginning of this article to succeed, several requirements must be met: The target must click on the link (obviously); but the computer must also play the song through speakers; the smartphone must be close enough to pick up the sound; and it must be running the targeted smart home app that works with AI voice recognition. Even if all these conditions are met, small changes to the conditions like background noise can disrupt the adversarial attacks.

But Chen warns that adversarial attacks should not be taken lightly. “We are very cautious about these types of emerging attacks, because how these AI models usually work is that we start with a base model, and we do some fine tuning or transfer learning to adapt the model to the end user’s needs. But basically, the base model and the base architecture remains the same,” he says. What this means is that if the base model has vulnerabilities, then these weaknesses might transfer to the modified models.

“There are many open-source speech recognition systems available on the internet. A lot of apps and a lot of smartphone manufacturers use these open-source out-of-box ASR systems in their apps and devices,” Chen notes. “If the model on your device happens to be known to the attacker, then the over-the-air attack will become easier. Because the adversarial examples can be generated from the same type of model.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.