Blog

Reinforcement learning challenge to push boundaries of embodied AI

April 26, 2021

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Since the early decades of artificial intelligence, humanoid robots have been a staple of sci-fi books, movies, and cartoons. Yet, after decades of research and development in AI, we still have nothing that comes close to the Jetsons’ Rosey the Robot.

This is because that many of our intuitive planning and motor skills—things that we take for granted—are a lot more complicated than we think. Navigating unknown areas, finding and picking up objects, choosing routes, and planning tasks are complicated feats that we only appreciate when we try to turn them into computer programs.

Developing robots that can physically sense the world and interact with their environment is the realm of embodied artificial intelligence, one of the long-sought goals of AI scientists. And though progress in the field is still a far shot from the capabilities of humans and animals, the achievements are remarkable nonetheless.

In a recent development in embodied AI, scientists at IBM, the Massachusetts Institute of Technology, and Stanford University developed a new challenge that will help assess the ability of AI agents in finding paths, interacting with objects, and planning tasks efficiently. Titled “ThreeDWorld Transport Challenge,” the test is a virtual environment that will be presented at the Embodied AI Workshop during the Conference on Computer Vision and Pattern Recognition, held online in June.

No current AI techniques comes close to solving the TDW Transport Challenge. But the results of the competition can help find new directions for the future of embodied AI and robotics research.

Reinforcement learning in virtual environments

At the heart of most robotics applications is reinforcement learning, a branch of machine learning that is based on actions, states, and rewards. A reinforcement learning agent is given a set of actions that it can apply to its environment to obtain rewards or reach a certain goal. These actions create changes to the state of the agent and the environment. The RL agent receives rewards based on how its actions bring it closer to its goal.

RL agents usually start by knowing nothing about their environment and selecting random actions. As they gradually receive feedback from their environment, they learn sequences of actions that can maximize their rewards.

This scheme is used not only in robotics, but in many other applications such as self-driving cars and content recommendation. Reinforcement learning has also helped researchers master complicated games such as Go, StarCraft 2, and DOTA.

Creating reinforcement learning models presents several challenges. One of them is designing the right set of states, rewards, and actions, which can be very difficult in applications such as robotics, where agents face a continuous environment that is affected by complicated factors such as gravity, wind, and physical interactions with other objects (in contrast, environments like chess and Go have very discrete states and actions).

Another challenge is gathering training data. Reinforcement learning agents need to train over data from millions of episodes of interactions with their environments. This constraint can slow down robotics applications because they must gather their data from the physical world as opposed to video and board games, which can be played in rapid succession on several computers.

To overcome this barrier, AI researchers have tried to create simulated environments for reinforcement learning applications. Today, self-driving cars and robotics often use simulated environments as a major part of their training regime.

“Training models using real robots can be expensive and sometimes involve safety considerations,” Chuang Gan, Principal Research Staff Member at the MIT-IBM Watson AI Lab, told TechTalks. “As a result, there has been a trend toward incorporating simulators, like what the TDW-Transport Challenge provides, to train and evaluate AI algorithms.”

But replicating the exact dynamics of the physical world is extremely difficult, and most simulated environments are a rough approximation of what a reinforcement learning agent would face in the real world. To address this limitation, the TDW Transport Challenge team has gone to great lengths to make the test environment as realistic as possible.

The environment is built on top of the ThreeDWorld platform, which the authors describe as “a general-purpose virtual world simulation platform supporting both near-photo realistic image rendering, physically-based sound rendering, and realistic physical interactions between objects and agents.”

“We aimed to use a more advanced physical virtual environment simulator to define a new embodied AI task requiring an agent to change the states of multiple objects under realistic physical constraints,” the researchers write in an accompanying paper.

Task and motion planning

Reinforcement learning tests have different degrees of difficulty. Most current tests involve navigation tasks, where an RL agent must find its way through a virtual environment based on visual and audio input.

The TDW Transport Challenge, on the other hand, pits the reinforcement learning agents against “task and motion planning” (TAMP) problems. TAMP requires the agent to not only find optimal movement paths but to also change the state of objects to achieve its goal.

The challenge takes place in a multi-roomed house adorned with furniture, objects, and containers. The reinforcement learning agent views the environment from a first-person perspective and must find one or several objects from the rooms and gather them at a specified destination. The agent is a two-armed robot, so it can only carry two objects at a time. Alternatively, it can use a container to carry several objects and reduce the number of trips it has to make.

At every step, the RL agent can choose one of several actions, such as turning, moving forward, or picking up an object. The agent receives a reward if it accomplishes the transfer task within a limited number of steps.

While this seems like the kind of problem any child could solve without much training, it is indeed a complicated task for current AI systems. The reinforcement learning program must find the right balance between exploring the rooms, finding optimal paths to the destination, choosing between carrying objects alone or in containers, and doing all this within the designated step budget.

“Through the TDW-Transport Challenge, we’re proposing a new embodied AI challenge,” Gan said. “Specifically, a robotic agent must take actions to move and change the state of a large number of objects in a photo- and physically-realistic virtual environment which remains a complex goal in robotics.”

Abstracting challenges for AI agents

TDW challenge first-person view — In the ThreeDWorld Transport Challenge, the AI agent can see the world through color, depth, and segmentation maps.

While TDW is a very complex simulated environment, the designers have still abstracted some of the challenges that robots would face in the real world. The virtual robot agent, dubbed Magnebot, has two arms with nine degrees of freedom with joints at the shoulder, elbow, and wrist. However, the robot’s hands are magnets and can pick up any object without the need to handle it with fingers, which itself is a very challenging task.

The agent also perceives the environment in three different ways, an RGB-colored frame, a depth map, and a segmentation map that shows each object separately in hard colors. The depth and segmentation maps make it easier for the AI agent to read the dimensions of the scene and tell the objects apart when viewed from awkward angles.

To avoid confusion, the problems are posed in a simple structure (e.g., “vase:2, bowl:2, jug:1; bed”) as opposed to loose language commands (e.g., “Grab two bowls, a couple of vases, and the jug in the bedroom, and put them all on the bed”).

And to simplify the state and action space, the researchers have limited the Magnebot’s navigation to 25-centimeter movements and 15-degree rotations.

These simplifications enable developers to focus on the navigation and task-planning problems that AI agents must overcome in the TDW environment.

Gan told TechTalks that despite the levels of abstraction introduced in TDW, the robot still needs to address the following challenges:

The synergy between navigation and interaction: The agent cannot move to grasp an object if this object is not in the egocentric view, or if the direct path to it is obstructed.
Physics-aware interaction: grasping might fail if the agent’s arm cannot reach an object.
Physics-aware navigation: collision with obstacles might cause objects to be dropped and significantly impede transport efficiency.

This makes one appreciate the complexity of human vision and agency. The next time you go to a supermarket, consider how easily you can find your way through aisles, tell the difference between different products, reach for and pick up different items, place them in your basket or cart, and choose your path in an efficient way. And you’re doing all this without access to segmentation and depth maps and by reading items from a crumpled handwritten note in your pocket.

Pure deep reinforcement learning is not enough

TDW Transport Challenge high-level planner — Experiments show that hybrid AI models that combine reinforcement learning with symbolic planners are better suited to solve the ThreeDWorld Transport Challenge

The TDW-Transport Challenge is in the process of accepting submissions. In the meantime, the authors of the paper have already tested the environment with several known reinforcement learning techniques. Their findings show that pure reinforcement learning is very poor at solving task and motion planning challenges. A pure reinforcement learning approach requires the AI agent to develop its behavior from scratch, starting with random actions and gradually refining its policy to meet the goals in the specified number of steps.

According to the researchers’ experiments, pure reinforcement learning approaches barely managed to achieve above 10 percent success in the TDW tests.

“We believe this reflects the complexity of physical interaction and the large exploration search space of our benchmark,” the researchers wrote. “Compared to the previous point-goal navigation and semantic navigation tasks, where the agent only needs to navigate to specific coordinates or objects in the scene, the ThreeDWorld Transport challenge requires agents to move and change the objects’ physical state in the environment (i.e., task-and-motion planning), which the end-to-end models might fall short on.”

When the researchers tried hybrid AI models, where a reinforcement learning agent was combined with a rule-based high-level planner, they saw a considerable boost in the performance of the system.

“This environment can be used to train RL models which fall short on these types of tasks and require explicit reasoning and planning abilities,” Gan said. “Through the TDW-Transport Challenge, we hope to demonstrate that a neuro-symbolic, hybrid model can improve this issue and demonstrate a stronger performance.”

The problem, however, remains largely unsolved, and even the best-performing hybrid systems had around 50-percent success rates. “Our proposed task is very challenging and could be used as a benchmark to track the progress of embodied AI in physically realistic scenes,” the researchers wrote.

Mobile robots are becoming a hot area of research and applications. According to Gan, several manufacturing and smart factories have already expressed interest in using the TDW environment for their real-world applications. It will be interesting to see whether the TDW Transport Challenge will help usher new innovations in the field.

“We’re hopeful the TDW-Transport Challenge can help advance research around assistive robotic agents in warehouses and home settings,” Gan said.

Moving beyond passive RAG: How to implement active memory reconstruction for…

How self-improving harnesses are rewriting the agent engineering playbook

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Reinforcement learning challenge to push boundaries of embodied AI

Reinforcement learning in virtual environments

Task and motion planning

Abstracting challenges for AI agents

Pure deep reinforcement learning is not enough

Like this:

Leave a ReplyCancel reply

Reinforcement learning in virtual environments

Task and motion planning

Abstracting challenges for AI agents

Pure deep reinforcement learning is not enough

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks