Blog

What happens when AI plays hide-and-seek 500 million times

September 23, 2019

openAI muli-agent hide and seek screenshot — Artificial intelligence lab OpenAI introduced an AI model that developed complex behavior after millions of rounds of hide-and-seek competition (source: YouTube)

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

For decades, artificial intelligence scientists have sought to create intelligent machines by trying to study and replicate the structure and functionality of the human brain. Last week, researchers at AI research lab OpenAI introduced a more fundamental approach at developing AI, a project inspired by natural selection and competition, the simple rules that have led to the evolution of all living beings, including humans.

The AI researchers pitted multiple AI agents against each other to compete for conflicting goals. They observed that the AI developed new and sophisticated behavior in the long term. While the project draws on existing AI techniques and concepts, it might provide new approaches and ideas to creating AI applications.

How AI discovers the rules of hide-and-seek

The OpenAI researchers presented their idea, published as a blog post and an accompanying video, in a virtual hide-and-seek game environment. In the setting, AI agents are divided between “hiders” and “seekers.” The seekers must discover the hiders, and the hiders must avoid being discovered. The hiders have a limited head start period to hide. During this time, the seekers are frozen and can’t do anything. Each of the AI agents are left on their own to discover ways to make use of their environment and surrounding object to achieve their goals.

The system uses reinforcement learning, an AI technique that is especially popular in research projects. AI researchers have previously used reinforcement learning to create bots that can play complicated games such as Go, StarCraft and Dota 2.

In reinforcement learning, each AI agent is given goals, incentives and the basic rules of an environment. The AI must then find ways to optimize its behavior to achieve the goals and maximize rewards.

Since the goals of the hiders and seekers are in direct conflict, the hide-and-seek setting provides the perfect environment for reinforcement learning. One of the challenges of reinforcement learning is providing AI agents with the right goals and rewards. AI agents often develop bizarre behavior when they try to balance between short- and long-term goals. This is why AI engineers must often provide explicit goals and tweak them as the agents progress.

One of the interesting twists of OpenAI’s new hide-and-seek system is that the researchers did not provide the AI agents with specific rewards on using objects in their environments. Instead the agents were only given rewards based on their team goals. Hiders were rewarded if they remained hidden and penalized if they were discovered. Seekers were rewarded and penalized in the opposite way.

The hiders and seekers had to discover on their own how they could use boxes, ramps and walls strewn across the game environment to achieve their goals.

Playing hide-and-seek millions of times

During the first stages of the training, the researchers released a team of two hiders and two seekers in a simple environment. At first, the AI agents failed to manifest any form of intelligent behavior and the hide-and-seek competition was a random chase between hiders and seekers. But over the course of millions of games both AI teams developed interesting individual and team tactics.

First, the hiders learned to move boxes to block doors and enclose themselves in rooms to prevent seekers from finding them. Each of the teams can move objects and lock them to prevent the other team from manipulating them (the researchers probably added this last detail to simplify the environment for the AI agents, or else the game could turn into an endless competition of hiders and seekers manipulating objects.)

But then, the seekers learned to move the ramps next to walls and climb over the discover the hiders. In response, the hiders learned to move the ramps inside their hideout to prevent the seekers from using them. This entire process panned over 50 million hide-and-seek episodes.

Afterwards, the AI researchers tried their model in a more open environment and increased the number and variety of objects to add to the complexity. The number of AI agents also increased, further opening the way for cooperation and teamwork. Again, the process started with random moves by the AI agents. But the AI gradually started to manifest interesting behavior patterns.

Contrary to the simple setting, the open environment didn’t have any closed rooms. But the hiders learned to create their own hideouts by placing wall objects around them and locking them. Again, the seeker AI learned to use ramps to jump over walls and discover the hiders.

openAI muli-agent hide and seek open environment screenshot — In the more advanced experiment, the AI agents are released in open environments and left to discover the rules of creating hideouts (Image source: YouTube)

A couple hundred million matches later, the hiders finally learned that they can lock the ramps before retreating into their hideouts. This prevented the seekers from moving the ramps next to the hideouts.

Fast forward another hundred million matches, the AI developed a new tactic: The seekers moved an unlocked box next to the locked ramp, used the ramp to climb the box, then glided next to the walled hideout, and discovered the hiders.

Finally, after 480 million episodes, the hiders learned that if they lock all objects before building their hideout, the seekers will have no way to discover them. Interestingly, they did it in a teamwork fashion, making sure they locked everything before the seekers became activated.

Interestingly, during the millions of hide-and-seek episodes, the AI found some interesting ways to exploit the game system to achieve its goals. For instance, in one case, the hiders learned to throw the ramps outside the game environment to prevent the seekers from using them. In another, the seekers managed to find a glitch in the system to fly over the playing field and find the hiders from above.

Some key takeaways from the hide-and-seek AI

OpenAI’s latest achievement highlights some of the characteristics of current artificial intelligence technologies. While the hide-and-seek AI is very impressive, I wouldn’t consider it a breakthrough, but rather a new way to test the limits and possibilities of current AI.

One fact that might be misrepresented in OpenAI’s blog is “intelligent behavior.” It seems that the AI agents are learning to use tools and develop teamwork as humans did over thousands of years. But deep learning and neural networks, the current prevalent technique of doing AI, work very differently from the human brain.

Neural networks’ “understanding” of the world and the way things work are just too different from ours. That’s why it takes them millions of trial-and-errors to discover things that a human can reason about subconsciously, such as the fact that moving a box in front of a doorway will block it and prevent others from entering and a ramp can enable you to climb over walls and other objects.

So while there’s an ongoing debate over the true definition of intelligence, I would caution over calling the AI’s behavior “intelligence.”

Also, as the OpenAI researchers note in their video, “this world is far less complex than earth.” Reinforcement learning is a good AI tool for research and discovery. But experience shows that the complexities of the real world can’t be fully simulated in reinforcement learning, which is suitable for scenarios where the rules are clear cut.

One interesting fact about OpenAI’s hide-and-seek project is that it confirms another argument about the history of advances in AI. A few months ago, machine learning researcher Rich Sutton posited that in the past six decades, the only AI techniques that have proven to be successful are those that can leverage advances in computing power and data, namely search and learning.

As with its previous AI models, OpenAI’s success was largely due to its capability of throwing more processing and data at the problem.

Meanwhile, the fact that success in AI is bound to compute resources also highlights a worrying truth about current AI. Deep learning research is very expensive. OpenAI didn’t release figures on the hide-and-seek AI model, but similar research projects can easily cost tens of millions of dollars in cloud resources (to say nothing of the huge salaries the lab has to pay to its researchers). This has limited AI research to organizations with deep pockets, and has slowly transitioned the endeavor from academic institutions to large tech companies.

But one very interesting conclusion from the project is the new light in which we can view artificial intelligence. For decades, we have used the human brain as the example to follow when creating intelligent systems. Maybe looking at the less complicated but equally efficient mechanisms of evolution will help us discover new ways to create systems that can solve problems in efficient (if not necessarily intelligent) ways. And who knows, maybe along the way, it’ll help us solve the ever-elusive mystery of creating human-level intelligence.

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Beyond vibe coding: How Codev 3.0 engineers the AI-powered dev team

How Cursor’s Composer 2.5 uses self-distillation to beat the frontier LLMs…

Vertical integration as AI infrastructure: What 21D’s full arch implant system…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

What happens when AI plays hide-and-seek 500 million times

How AI discovers the rules of hide-and-seek

Playing hide-and-seek millions of times

Some key takeaways from the hide-and-seek AI

Like this:

1 COMMENT

Leave a ReplyCancel reply

How AI discovers the rules of hide-and-seek

Playing hide-and-seek millions of times

Some key takeaways from the hide-and-seek AI

Like this:

1 COMMENT

Leave a ReplyCancel reply

Discover more from TechTalks