How to trick deep learning algorithms into doing new things

Robot Rubick's Cube
How do you teach an AI algorithm new tricks? (Image credit: Depositphotos)

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Two things often mentioned with deep learning are “data” and “compute resources.” You need a lot of both when developing, training, and testing deep learning models. When developers don’t have a lot of training samples or access to very powerful servers, they use transfer learning to finetune a pre-trained deep learning model for a new task.

At this year’s ICML conference, scientists at IBM Research and Taiwan’s National Tsing Hua University Research introduced “black-box adversarial reprogramming” (BAR), an alternative repurposing technique that turns a supposed weakness of deep neural networks into a strength.

BAR expands the original work on adversarial reprogramming and previous work on black-box adversarial attacks to make it possible to expand the capabilities of deep neural networks even when developers don’t have full access to the model.

Pretrained and finetuned deep learning models

When you want to develop an application that requires deep learning, one option is to create your own neural network from scratch and train it on available or curated examples. For instance, you can use ImageNet, a public dataset that contains more than 14 million labeled images.

There is a problem, however. First, you must find the right architecture for the task, such as the number and sequence of convolution, pooling, and dense layers. You must also decide the number of filters and parameters for each layer, the learning rate, optimizer, loss function, and other hyperparameters. A lot of these decisions require trial-and-error training, which is a slow and costly process unless you have access to strong graphics processors or specialized hardware such as Google’s TPU.

To avoid reinventing the wheel, you can download a tried-and-tested model such as AlexNet, ResNet, or Inception, and train it yourself. But you’ll still need a cluster of GPUs or TPUs to complete the training in an acceptable amount of time. To avoid the costly training process, you can download the pre-trained version of these models and integrate them into your application.

Robot reading book
Image credit: Depositphotos

Alternatively, you can use a service such as Clarifia and Amazon Rekognition, which provide application programming interfaces for image recognition tasks. These services are “black-box” models because the developer doesn’t have access to the network layers and parameters and can only interact with them by providing them images and retrieving the resulting label.

Now, suppose you want to create a computer vision algorithm for a specialized task, such as detecting autism from brain scans or breast cancer from mammograms. In this case, a general image recognition model such as AlexNet or a service like Clarifai won’t cut it. You need a deep learning model trained on data for that problem domain.

The first problem you’ll face is gathering enough data. A specialized task might not require 14 million labeled images, but you’ll still need quite a few if you’re training the neural network from scratch.

Transfer learning allows you to slash the number of training examples. The idea is to take a pre-trained model (e.g., ResNet) and retrain it on the data and labels from a new domain. Since the model has been trained on a large dataset, its parameters are already tuned to detect many of the features that will come in handy in the new domain. Therefore, it will take much less time and data to retrain it for the new task.

deep learning transfer learning
Transfer learning finetunes the parameters of a pre-trained neural network for a new task

While it sounds easy, transfer learning is itself a complicated process and does not work well in all circumstances. Based on how close the source and target domains are, you’ll need to freeze and unfreeze layers and add new layers to the model during the transfer learning. You’ll also need to do a lot of hyperparameter tweaking in the process.

In some cases, transfer learning can perform worse than training a neural network from scratch. You also can’t perform transfer learning on API-based systems where you don’t have access to the deep learning model.

Adversarial attacks and reprogramming

Adversarial reprogramming is an alternative technique for repurposing machine learning models. It leverages adversarial machine learning, an area of research that explores how perturbations to input data can change the behavior of neural networks. For example, in the image below, adding a layer of noise to the panda photo on the left causes the award-winning GoogLeNet deep learning model to mistake it for a gibbon. The manipulations are called “adversarial perturbations.”

artificial intelligence adversarial example panda
Adding a layer of noise to the panda image on the left turns it into an adversarial example

Adversarial machine learning is usually used to display vulnerabilities in deep neural networks. Researchers often use the term “adversarial attacks” when discussing adversarial machine learning. One of the key aspects of adversarial attacks is that the perturbations must go undetected to the human eye.

At the ICLR 2019 conference, artificial intelligence researchers at Google showed that the same technique can be used to enable neural networks to perform a new task, hence the name “adversarial reprogramming.”

“We introduce attacks that instead reprogram the target model to perform a task chosen by the attacker,” the researchers wrote at the time.

Adversarial reprogramming shares the same basic idea as adversarial attacks: The developer changes the behavior of a deep learning model not by modifying its parameters but by making changes to its input.

There are, however, also some key differences between adversarial reprogramming and attacks (aside from the obvious goal). Unlike adversarial examples, reprogramming is not meant to deceive human observers, therefore the modifications to the input data do not need to be imperceptible to the human eye. Also, while in adversarial attacks, noise maps must be calculated per input, adversarial reprogramming uses a single perturbation map to all inputs.

Adversarial reprogramming Google
Adversarial reprogramming creates input noise maps that repurpose a deep learning model for a new task (source:

For instance, a deep learning model (e.g., ResNet) trained on the ImageNet dataset can detect 1,000 common things such as animals, plants, objects, etc. An adversarial program aims to repurpose the AI model for another task, such as the number of white squares in an image (see example above). After running the adversarial program on the images, the deep learning model will be able to distinguish each class. However, since the model has been originally trained for another task, you’ll have to map the output to your target domain. For example, if the model outputs goldfish, then it’s an image with two squares, tiger shark is four squares, etc.

The adversarial program is obtained by starting with a random noise map and making small changes until you achieve the desired outputs.

Basically, adversarial reprogramming creates a wrapper around the deep learning model, modifying every input that goes in with the adversarial noise map and mapping the outputs to the target domain. Experiments by the AI researchers showed that in many cases, adversarial reprogramming can produce better results than transfer learning.

Black-box adversarial learning

While adversarial reprogramming does not modify the original deep learning model, you still need access to the neural network’s parameters and layers to train and tune the adversarial program (more specifically, you need access to gradient information). This means that you can’t apply it to black-box models such as the commercial APIs mentioned earlier.

This is where black-box adversarial reprogramming (BAR) enters the picture. The adversarial reprogramming method developed by researchers at IBM and Tsing Hua University does not need access to the details of deep learning models to change their behavior.

To achieve this, the researchers used Zeroth Order Optimization (ZOO), a technique previously developed by AI researchers at IBM and the University of California Davis. The ZOO paper proved the feasibility of black-box adversarial attacks, where an attacker could manipulate the behavior of a machine learning model by simply observing inputs and outputs and without having access to gradient information.

BAR uses the same technique to train the adversarial program. “Gradient descent algorithms are primary tools for training deep learning models,” Pin-Yu Chen, chief scientist at IBM Research and co-author of the BAR paper, told TechTalks. “In the zeroth-order setting, you don’t have access to the gradient information for model optimization. Instead, you can only observe the model outputs (aka function values) at queries points.” In effect, this means that you can, for example, only provide an image to the deep learning model and observe its results.

“ZOO enables gradient-free optimization by using estimated gradients to perform gradient descent algorithms,” Chen says. The main advantage of this method is that it can be applied to any gradient-based algorithms and is not limited to neural-network-based systems alone.

Black-box adversarial reprogramming
Black-box adversarial reprogramming can repurpose neural networks for new tasks without having full access to the deep learning model. (source:

Another improvement Chen and his colleagues added in BAR is “multi-label mapping”: Instead of mapping a single class from the source domain to the target domain (e.g., goldfish = one square), they found a way to map several source labels to the target (e.g., tench, goldfish, hammerhead = one square).

“We find that multiple-source-labels to one target-label mapping can further improve the accuracy of the target task when compared to one-to-one label mapping,” the AI researchers write in their paper.

To test black-box adversarial reprogramming, the researchers used it to repurpose several popular deep learning models for three medical imaging tasks (autism spectrum disorder classification, diabetic retinopathy detection, and melanoma detection). Medical imaging is an especially attractive use for techniques such as BAR because it is a domain where data is scarce, expensive to come by, and subject to privacy regulations.

In all three tests, BAR performed better than transfer learning and training the deep learning model from scratch. It also did nearly as well as standard adversarial reprogramming.

The AI researchers were also able to reprogram two commercial, black-box image classification APIs (Clarifai Moderation and NSFW APIs) with BAR, obtaining decent results.

“The results suggest that BAR/AR should be a strong baseline for transfer learning, given that only wrapping the inputs and outputs of an intact model can give good transfer learning results,” Chen said.

In the future, the AI researchers will explore how BAR can be applied to other data modalities beyond image-based applications.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.