Blog

Stanford reinforcement learning system simulates evolution

October 25, 2021

stanford deep evolutionary reinforcement learning

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Hundreds of millions of years of evolution have blessed our planet with a wide variety of lifeforms, each intelligent in its own fashion. Each species has evolved to develop innate skills, learning capacities, and a physical form that ensure its survival in its environment.

But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and fusing them together after development. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest lifeforms.

In a new paper published in the scientific journal Nature, AI researchers at Stanford University present a new technique that can help take steps toward overcoming some of these limits. Titled “Deep Evolutionary Reinforcement Learning,” the new technique uses a complex virtual environment and reinforcement learning to create virtual agents that can evolve both in their physical structure and learning capacities. The findings can have important implications for the future of AI and robotics research.

Evolution is hard to simulate

In nature, body and brain evolve together. Across many generations, every animal species has gone through countless cycles of mutation to grow limbs, organs, and a nervous system to support the functions it needs in its environment. Mosquitos have thermal vision to spot body heat. Bats have wings to fly and an echolocation apparatus to navigate dark places. Sea turtles have flippers to swim and a magnetic field detector system to travel very long distances. Humans have an upright posture that frees their arms and lets them see the far horizon, hands and nimble fingers that can manipulate objects, and a brain that makes them the best social creatures and problem solvers on the planet.

Interestingly, all these species descended from the first lifeform that appeared on Earth several billion years ago. Based on the selection pressures caused by the environment, the descendants of those first living beings evolved in many different directions.

Studying the evolution of life and intelligence is interesting. But replicating it is extremely difficult. An AI system that would want to recreate intelligent life in the same way that evolution did would have to search a very large space of possible morphologies, which is extremely expensive computationally. It would need a lot of parallel and sequential trial-and-error cycles.

AI researchers use several shortcuts and predesigned features to overcome some of these challenges. For example, they fix the architecture or physical design of an AI or robotic system and focus on optimizing the learnable parameters. Another shortcut is the use of Lamarckian rather than Darwinian evolution, in which AI agents pass on their learned parameters to their descendants. Yet another approach is to train different AI subsystems separately (vision, locomotion, language, etc.) and then tack them on together in a final AI or robotic system. While these approaches speed up the process and reduce the costs of training and evolving AI agents, they also limit the flexibility and variety of results that can be achieved.

Deep Evolutionary Reinforcement Learning

In their new work, the researchers at Stanford aim to bring AI research a step closer to the real evolutionary process while keeping the costs as low as possible. “Our goal is to elucidate some principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control,” they write in their paper.

Their framework is called Deep Evolutionary Reinforcement Learning. In DERL each agent uses deep reinforcement learning to acquire the skills required to maximize its goals during its lifetime. DERL uses Darwinian evolution to search the morphological space for optimal solutions, which means that when a new generation of AI agents are spawned, they only inherit the physical and architectural traits of their parents (along with slight mutations). None of the learned parameters are passed on across generations.

“DERL opens the door to performing large-scale in silico experiments to yield scientific insights into how learning and evolution cooperatively create sophisticated relationships between environmental complexity, morphological intelligence, and the learnability of control tasks,” the researchers write.

Simulating evolution

For their framework, the researchers used MuJoCo, a virtual environment that provides highly accurate rigid-body physics simulation. Their design space is called UNIversal aniMAL (UNIMAL), in which the goal is to create morphologies that learn locomotion and object-manipulation tasks in a variety of terrains.

Each agent in the environment is composed of a genotype that defines its limbs and joints. The direct descendant of each agent inherits the parent’s genotype and goes through mutations that can create new limbs, remove existing limbs, or make small modifications to characteristics such as the degrees of freedom or the size of limbs.

Each agent is trained with reinforcement learning to maximize rewards in various environments. The most basic task is locomotion, in which the agent is rewarded for the distance it travels during an episode. Agents whose physical structure are better suited for traversing terrain learn faster to use their limbs for moving around.

To test the system’s results, the researchers generated agents in three types of terrains: flat (FT), variable (VT), and variable terrains with modifiable objects (MVT). The flat terrain puts the least selection pressure on the agents’ morphology. The variable terrains, on the other hand, force the agents to develop a more versatile physical structure that can climb slopes and move around obstacles. The MVT variant has the added challenge of requiring the agents to manipulate objects to achieve their goals.

The benefits of DERL

DERL morphology variety — Deep Evolutionary Reinforcement Learning generates a variety of successful morphologies across different environments

One of the interesting findings of the DERL is the diversity of the results. Other approaches to evolutionary AI tend to converge on one solution because new agents directly inherit the physique and learnings of their parents. But in DERL, only morphological data is passed on to descendants, the system ends up creating a diverse set of successful morphologies, including bipeds, tripeds, and quadrupeds with and without arms.

At the same time, the system shows traits of the Baldwin effect, which suggests that agents that learn faster are more likely to reproduce and pass on their genes to the next generation. DERL shows that evolution “selects for faster learners without any direct selection pressure for doing so,” according to the Stanford paper.

“Intriguingly, the existence of this morphological Baldwin effect could be exploited in future studies to create embodied agents with lower sample complexity and higher generalization capacity,” the researchers write.

DERL evaluation tasks — Agents trained in DERL are evaluated across a variety of tasks

Finally, the DERL framework also validates the hypothesis that more complex environments will give rise to more intelligent agents. The researchers tested the evolved agents across eight different tasks, including patrolling, escaping, manipulating objects, and exploration. Their findings show that in general, agents that have evolved in variable terrains learn faster and perform better than AI agents that have only experienced flat terrain.

Their findings seem to be in line with another hypothesis by DeepMind researchers that a complex environment, a suitable reward structure, and reinforcement learning can eventually lead to the emergence of all kinds of intelligent behaviors.

AI and robotics research

The DERL environment only has a fraction of the complexities of the real world. “Although DERL enables us to take a significant step forward in scaling the complexity of evolutionary environments, an important line of future work will involve designing more open-ended, physically realistic, and multi-agent evolutionary environments,” the researchers write.

In the future, the researchers will expand the range of evaluation tasks to better assess how the agents can enhance their ability to learn human-relevant behaviors.

The work can have important implications for the future of AI and robotics and push researchers to use exploration methods that are much more similar to natural evolution.

“We hope our work encourages further large-scale explorations of learning and evolution in other contexts to yield new scientific insights into the emergence of rapidly learnable intelligent behaviors, as well as new engineering advances in our ability to instantiate them in machines,” the researchers write.

Moving beyond passive RAG: How to implement active memory reconstruction for…

How self-improving harnesses are rewriting the agent engineering playbook

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Stanford reinforcement learning system simulates evolution

Evolution is hard to simulate

Deep Evolutionary Reinforcement Learning

Simulating evolution

The benefits of DERL

AI and robotics research

Like this:

Leave a ReplyCancel reply

Evolution is hard to simulate

Deep Evolutionary Reinforcement Learning

Simulating evolution

The benefits of DERL

AI and robotics research

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks