Can GPT-4-based LLM agents become autonomous cyber weapons?

evil robot
Image generated with Bing Image Creator

This article is part of our coverage of the latest in AI research.

One of the exciting applications of large language models (LLM) is agents that tackle complicated tasks with minimal human intervention. However, if not properly overseen, LLM agents could inadvertently cause harm when interacting with the real world. Moreover, malicious actors could abuse LLM agents to automate their attacks. 

But the complexity of these systems makes it difficult to predict their behavior, as they are very different from human intelligence. This makes it difficult to effectively evaluate the autonomy of LLM agents, especially in becoming rogue malicious actors.

A new paper by Alignment Research seeks to quantify the autonomy of LLM agents. By testing advanced models like GPT-4 and Claude on open-ended tasks and observing their ability to adapt to changing environments, they aim to understand better the capabilities and limitations of these agents. 

Autonomous replication and adaptation

The paper introduces “autonomous replication and adaptation” (ARA), a benchmark for assessing an agent’s level of sophistication. ARA is an agent’s ability to perform tasks while adapting to its environment, akin to an intelligent being. This involves the agent’s capacity to plan its actions, gather resources, use them effectively, and refine its abilities to achieve specific objectives.

For example, an agent should be able to generate income to pay for its expenses. It then reinvests its earnings to purchase additional processing power and updating its model.

This self-improvement cycle would involve the agent training itself on new data sets to sharpen its skills. Crucially, the agent must also be able to assess the success of its strategies and make adjustments to reach its goals.

Achieving this cycle of ARA could lead to a scenario where a model scales its processes. It could replicate itself across hundreds or thousands of instances, each specialized for distinct tasks. These agents could then be coordinated to accomplish complex objectives. The implications of this are profound, as such a system could be directed towards either beneficial or harmful ends.

“In general, once a system is capable of ARA, placing bounds on a system’s capabilities may become significantly more difficult,” the researchers write. “If an AI system is able to proliferate large numbers of copies outside of human control, it is much harder to bound the risks posed by that system.”

LLM agents

autogen_agentchat
Different configurations of LLM agents (source: Microsoft GitHub)

You can give GPT-4 a high-level goal and prompt it to deconstruct it into actionable steps. It can then recursively divide each step into smaller, more detailed sub-tasks until it creates a clear sequence of actions. The LLM can pass these actions to other models or programs that run them.

There are various frameworks for creating agents that interface with LLMs and use their outputs to carry out actions. These actions range from web browsing and running computer code to making API calls, effectively allowing the LLM to interact with its environment.

In the ARA test designed by researchers, they use a scaffolding program. This program initiates the process by presenting the LLM with a high-level goal. It then interprets the model’s responses, carries out the suggested actions, and feeds the results back to the LLM. This feedback loop enables the LLM to plan subsequent steps based on the outcomes of previous actions.

The scaffolding program can execute diverse actions, including writing and running Python or Bash scripts, navigating the web through a browser, or issuing commands to other sub-agents. 

“At the start of each step, the language model is provided with the initial prompt, which outlines the goal and instructions for interacting with the scaffolding program, as well as a history of the thoughts, actions, and outcomes that have transpired. Using this information, the agent determines the next thought to record or action to take,” the researchers write.

Evaluating LLM agents’ ability to perform malicious actions

LLM agent structure
LLM agent architecture used by Alignment Research (source: arxiv)

The researchers crafted a suite of 12 tasks designed to test the agents’ capabilities in performing harmful activities. These tasks include straightforward actions, such as searching a filesystem for passwords linked to specific email addresses, to more complex operations like executing a targeted phishing scheme or deploying an open-source LLM on an Amazon EC2 instance.

“If an LM agent would be able to autonomously replicate and adapt to novel challenges under messy conditions in the real world, we are fairly confident it would be able to solve most or all of the 12 tasks in our pilot suite,” the researchers write.

They tested the tasks with LLM agents powered by the most capable models, GPT-4 and Claude. Their findings show that current LLM agents are only adept at the simplest tasks. While they could formulate credible high-level plans, they struggled with implementing them. In effect, they could not deal with the unpredictable and complex nature of the real world.

“We believe that the tasks these agents failed to accomplish were considerably easier than autonomously replicating in the wild, and therefore that these agents are far from capable of ARA,” they conclude.

The examples highlight the gap between the capabilities of LLM agents and the practical intelligence required to navigate our world. For instance, in the targeted phishing scenario, the LLM agent could identify the necessary steps, such as gathering information about the target and creating a phishing email. However, it failed on key actions, like accurately replicating an HTML page or properly signing up and logging into a web hosting service. The agent either failed to recognize its errors or became trapped in a loop, repeating the same mistakes.

Moreover, the LLM agent exhibited a tendency to make “hallucinations”—generating false information or scenarios. It also misdiagnoses obvious errors and shows a lack of understanding of its own solutions and those suggested by sub-agents. These shortcomings underscore the importance of everyday tasks and cognitive abilities in human intelligence, which remain significant obstacles for AI to overcome.

What are the implications?

LLMs have made remarkable strides in executing tasks that were once thought to demand high levels of human intellect. But they are not ready to deal with the unpredictable and intricate nature of the real world.

The study also shows that benchmarks commonly used to gauge LLM performance are not suitable measures of true intelligence. On one hand, LLMs can carry out complex tasks that would typically require years of human training and expertise. On the other, they are prone to errors that most humans would avoid with minimal data and life experience.

ARA can be a promising metric to test the genuine capabilities of LLM agents for both beneficial and harmful actions. Currently, even the most sophisticated LLMs have not reached a level where they are ARA-ready. 

The researchers write, “We believe our agents are representative of the kind of capabilities achievable with some moderate effort, using publicly available techniques and without fine-tuning. As a result, we think that in the absence of access to fine-tuning, it is highly unlikely that casual users of these versions of GPT-4 or Claude could come close to the ARA threshold.”

LLMs still have fundamental problems that preven them to think and plan like humans. However, the landscape is rapidly evolving. LLMs and the platforms that use them continue to improve. The process of fine-tuning LLMs is becoming more affordable and accessible. And the capabilities of models continue to advance. It could be a matter of time before creating LLM agents that with a semblance of ARA-readiness becomes feasible.  

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.