Blog

Neural networks can hide malware, researchers find

December 9, 2021

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

With their millions and billions of numerical parameters, deep learning models can do many things: detect objects in photos, recognize speech, generate text—and hide malware. Neural networks can embed malicious payloads without triggering anti-malware software, researchers at the University of California, San Diego, and the University of Illinois have found.

Their malware-hiding technique, EvilModel, sheds light on the security concerns of deep learning, which has become a hot topic of discussion in machine learning and cybersecurity conferences. As deep learning becomes ingrained in applications we use every day, the security community needs to think about new ways to protect users against their emerging threats.

Hiding malware in deep learning models

Every deep learning model is composed of multiple layers of artificial neurons. Based on the type of layer, each neuron has connections to all or some of the neurons in its previous and next layer. The strength of these connections is defined by numerical parameters that are during training, as the DL model learns the task it has been designed for. Large neural networks can comprise hundreds of millions or even billions of parameters.

workflow for embedding malware in neural networks — Workflow for EvilModel, a technique that embeds malware in neural networks

The main idea behind EvilModel is to embed malware into the parameters of a neural network in a way that make it invisible to malware scanners. This is a form of steganography, the practice of concealing one piece of information in another.

At the same time, the infected deep learning model must perform its main task (e.g., classifying images) as well or almost as well as a clean model to avoid causing suspicion or making it useless to its victims.

Finally, the attacker must have a mechanism to deliver the infected model to the target devices and extract the malware from the parameters of the neural network.

Changing parameter values

Most deep learning models use 32-bit (4 bytes) floating-point numbers to store parameter values. According to the researchers’ experiments, an attacker can store up to 3 bytes of malware in each parameter without affecting its value significantly.

32-bit floating-point number — Every parameter in a neural network is composed of a 4-byte floating-point number. According to researchers, up to 3 bytes can be used to embed malicious code without changing the number’s value significantly.

When infecting the deep learning model, the attacker breaks the malware into 3-byte pieces and embeds the data into its parameters. To deliver the malware to the target, the attacker can publish the infected neural network on one of several online locations that host deep learning models such as GitHub or TorchHub. Alternatively, the attacker can stage a more complicated form of supply chain attack, where the infected model is delivered through automatic updates to software installed on the target device.

Once the infected model is delivered to the victim, a piece of software extracts the payload and executes it.

Hiding malware in convolutional neural networks

To verify the feasibility of EvilModel, the researchers tested it on several convolutional neural networks (CNN). Several reasons make CNNs an interesting study. First, they are fairly large, usually containing dozens of layers and millions of parameters. Second, they contain a diverse architecture and encompass different types of layers (fully connected, convolutional) and different generalization techniques (batch normalization, dropout layers, pooling layers, etc.), which makes it possible to evaluate the effects of malware-embedding in different settings. Third, CNNs are widely used in computer vision applications, which could make them a prime target for malicious actors. And finally, there are numerous pretrained CNNs that are ready to be integrated into applications without any change, and many developers use pretrained CNNs in their applications without necessarily knowing how deep learning works under the hood.

The researchers first tried embedding malware in AlexNet, a popular CNN which helped renew interest in deep learning in 2012. AlexNet is 178 megabytes and has five convolutional layers and three dense (or fully connected) layers.

alexnet CNN architecture — AlexNet Convolutional Neural Network (CNN)

When AlexNet was trained with batch normalization, a technique that standardizes training examples in groups before running them through the deep learning model, the researchers were able to embed 26.8 megabytes of malware in the model while keeping its accuracy within 1-percent of the clean version. If they increased the volume of malware data, the accuracy would start to drop significantly.

They next tried to retrain the model after infecting it. By freezing the infected neurons, they prevented them from being modified during the extra training cycles. With batch normalization and retraining, the researchers could increase the volume of malware data to 36.9 megabytes while maintaining the model’s accuracy above 90 percent.

malware embedded in neural networks accuracy — Left: Deeper layers of the neural network preserve their accuracy when they’re infected with malware. Right: Batch normalization and retraining after infection improve the model’s accuracy

The models were infected with eight samples from the InQuest database, all of which were recognized as malware by the online VirusTotal scanner. Once the samples were embedded in the neural networks, the entire model was uploaded in VirusTotal but was marked as safe, which proves that the malware was well hidden.

The researchers tested the technique on several other CNN architectures, including VGG, Resnet, Inception, and Mobilenet. They obtained similar results, which shows that malware-embedding is a universal threat to large neural networks.

Securing machine learning pipelines

As malware scanners can’t detect malicious payloads embedded in deep learning models, the only countermeasure against EvilModel is to destroy the malware.

The payload only maintains its integrity if its bytes remain intact. Therefore, if the recipient of an EvilModel retrains the neural network without freezing the infected layer, its parameter values will change and the malware data will be destroyed. Even a single epoch of training is probably enough to destroy any malware embedded in the DL model.

However, most developers use pretrained models as they are, unless they want to fine-tune them for another application. And some forms of finetuning freeze most existing layers in the network, which might include the infected layers.

This means that alongside adversarial attacks, data-poisoning, membership inference, and other known security issues, malware-infected neural networks are a real threat to the future of deep learning.

The Adversarial ML Threat Matrix to provide weak spots in the machine learning pipeline

The differences between machine learning models and classic rule-based software require new ways to think about security threats. Earlier this year, several organizations introduced the Adversarial ML Threat Matrix, a framework that helps find weaknesses in machine learning pipelines and patch security holes.

While the Threat Matrix focuses on adversarial attacks, its methods are also applicable to threats such as EvilModels. Until researchers find more robust methods to detect and block malware in deep neural networks, we must establish a chain of trust in machine learning pipelines. Knowing that malware scanners and static analysis tools can’t detect infected models, developers must make sure their models come from trusted sources and the provenance of training data and learned parameters are not compromised. As we continue to learn about the security of deep learning, we must be wary of what’s hidden behind the millions of numbers that get crunched to analyze our photos and recognize our voices.

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Beyond vibe coding: How Codev 3.0 engineers the AI-powered dev team

How Cursor’s Composer 2.5 uses self-distillation to beat the frontier LLMs…

Vertical integration as AI infrastructure: What 21D’s full arch implant system…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Neural networks can hide malware, researchers find

Hiding malware in deep learning models

Changing parameter values

Hiding malware in convolutional neural networks

Securing machine learning pipelines

Like this:

Leave a ReplyCancel reply

Hiding malware in deep learning models

Changing parameter values

Hiding malware in convolutional neural networks

Securing machine learning pipelines

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks