Are LLMs truly intelligent? New study questions the ’emergence’ of AI abilities

AI brain mirage
Image created with Imagen 4

This article is part of our coverage of the latest in AI research.

The term “emergence” is a hot topic in the AI research community. It’s often used to describe the surprising moment when, after being scaled up with more data and computing power, a large language model (LLM) suddenly develops new, unpredicted abilities. But a new paper from researchers at the Santa Fe Institute and Johns Hopkins School of Medicine challenges this popular notion, arguing that we may be misinterpreting impressive performance for true emergent intelligence.

The popular view of emergence in LLMs

In the LLM world, emergence is commonly defined as the phenomenon where capabilities arise suddenly and unexpectedly as models grow. A smaller model might have near-zero performance at a task, but a larger version of the same model might see a sudden jump in its capability at the same task. 

This has led to claims of emergence for abilities that models were not explicitly trained for, from three-digit addition to legal reasoning. These sudden jumps in performance on specific benchmarks are often cited as evidence that something special is happening inside these scaled-up networks.

LLMs emergent abilities
Large language models show emergent abilities at scale, where performance on a task remains at random levels until the model’s size reaches a certain threshold. After that, performances jumps and starts to improve as the model grows larger (source: arXiv).

However, the authors of the paper push back on this definition. They argue that these observations, while impressive, don’t meet the more rigorous scientific standard for emergence. In their view, “Very few of the features of LLMs, from the abruptness of performance increases on benchmarks, through to generalization, have much, if anything to do with any technical sense of the word emergence, and are adequately described using the more familiar ideas of learning, inference, compression, and perhaps development.”

What is emergence, really?

Outside of AI, emergence has a precise meaning in complexity science. It describes how a system of many parts develops higher-level properties that can be described with a new, simpler set of rules. 

Think of the difference between molecular dynamics and fluid dynamics. You could try to predict the behavior of a wave by tracking the position and momentum of every single water molecule—an impossibly complex task. Or, you could use the principles of fluid dynamics, which describe the wave using simpler, “coarse-grained” variables like mass, pressure, and flow. The laws of fluid dynamics emerge from the lower-level molecular interactions.

This is why the concept is so powerful. As the paper states, “Emergence matters because it leads to an enormous cost saving in how systems are described, predicted, and controlled.” Without it, the only way to understand the world would be to analyze the microscopic interactions of all its basic parts. Systems with emergent properties lend themselves to efficient, high-level descriptions and abstractions that give us levers to control and design things, from building a bridge using classical mechanics to predicting economic trends using macro-level variables.

‘Knowledge-out’ vs. ‘knowledge-in’

The researchers introduce a helpful distinction: “knowledge-out” (KO) versus “knowledge-in” (KI) emergence. KO emergence is common in physics and chemistry, where complex structures (like a snowflake) arise from simple components (water molecules) following simple rules. The complexity comes “out” of the system itself.

In contrast, KI emergence characterizes complex adaptive systems like brains, economies, and LLMs. Here, complex behavior arises from systems that process complex inputs from their environment. For an LLM, the “knowledge” is already “in” the vast corpus of text it was trained on. Since LLMs are clearly KI systems, a claim of emergence can’t just be based on their output. It must also describe the “coarse-grained global properties” and the “local microscopic mechanisms” inside the model that produce this behavior.

Emergent capabilities vs. emergent intelligence

Given this framework, the authors argue that for LLMs, the term emergence should be reserved for cases where there is clear evidence of new, compressed internal representations (a true reorganization inside the neural network). This leads to a crucial distinction between “emergent capabilities” and “emergent intelligence.”

An LLM can have many impressive capabilities, just as a calculator can perform arithmetic far better than any human. But we don’t call a calculator intelligent. True “emergent intelligence,” the paper suggests, is the internal use of these compressed, coarse-grained models to solve a broad range of problems efficiently. Humans do this through analogy and abstraction. For example, the inverse-square law is a single, compact concept that, with minimal modification, explains phenomena across gravity, electrostatics, and acoustics. This is the hallmark of human intelligence: doing “more with less.”

LLMs, on the other hand, have not yet proven they operate this way. As the authors conclude, “Emergent intelligence is clearly a feature of human reasoning and human language, but as of yet, an unproven feature of LLMs, which at best… demonstrate emergent capability.”

Why it matters for developers and users

This distinction has real-world consequences for how we build and use AI. We are at a point where the performance of LLMs is easily mistaken for human-like intelligence. This can lead us to grant them agency and decision-making power in sensitive or mission-critical tasks.

But these models, for all their capabilities, can make bafflingly simple mistakes. They can perform highly complex tasks one moment and fail at a simple one the next, making them unpredictable in situations where human behavior is reliable. They possess a powerful, but alien, form of problem-solving that is not a direct substitute for human intelligence.

The key takeaway is that we must not mistake capability for intelligence. For now, LLMs are best treated as powerful assistants that can amplify human productivity on specific tasks. But when it comes to the key decisions, a human must remain in control, fully aware that the tool they’re using, while capable, doesn’t “think” like we do.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.