Reinforcement learning helped robots solve Rubik’s Cube—does it matter?

5 min read

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

Last year, OpenAI, the famous San Francisco–based artificial intelligence lab, introduced Dactyl, a robotic hand that could handle various objects. This week, Dactyl showed off its new trick: solving a Rubik’s Cube in four minutes—with one hand.

In the past decades, the robotics industry has seen tremendous progress. But replicating the abilities of human hands remains a challenge. In recent years, thanks to advances in deep learning and artificial neural networks, scientists have been able to move closer toward creating robotic hands.

In this regard, Dactyl’s latest feat is very remarkable, even though it looks shaky and clumsy. Learning to handle the Rubik’s Cube shows how far research in robotics and artificial intelligence has come. But it also reminds us of the limits of current AI technologies and how much more we must go.

OpenAI showcased its new research in a lengthy blog post and a lengthier paper.

Training the AI in a simulated environment

The main innovation in OpenAI’s robotic hand is not its capability to solve the Rubik’s Cube. There are already plenty of robots that can do it in under one second, though they look nothing like a human hand. The real challenge is doing it like humans.

To learn to manipulate the Cube as a human does, Dactyl uses reinforcement learning, a branch of AI that develops behavior through trial and error. Classical AI approaches require human engineers to explicitly program every move the robot must make, which is virtually impossible. In contrast, reinforcement learning enables the AI to discover the rules of handling objects on its own.

But reinforcement learning isn’t without its limits and requires a massive amount of trials to learn a task. Dactyl needed 10,000 years’ worth of RL training to learn to handle the Cube. Doing this in a physical environment would be impossible.

That’s why the OpenAI researchers used a simulated environment to train the AI. Simulation is a common approach in training AI models that are supposed to work in the physical world. Scientists often use simulation to train machine learning algorithms in areas such as self-driving cars and drone flight controllers.

openai robotic hand rubiks cube simulation
OpenAI used a simulation environment to train its AI on solving the Rubik’s Cube under different conditions. (Source: YouTube)

The problem with simulated AI training is that it’s useful for limited environments such as games but too simplistic for realistic conditions. This approach fails if the AI is supposed to operate in an open environment (e.g., driving in public roads). Open environments are hard to master because there are innumerous factors that can affect them. They’re unpredictable, and mastering them requires commonsense and generalization capabilities, the things that current AI technologies lack.

Neural networks are terrible at handling unpredictable situations and need to be explicitly instructed on every type of problem and scenario they must handle. To solve this problem, the researchers at OpenAI developed a method called Automated Domain Randomization. ADR constantly adds noise and randomness of the parameters of the environment to help the AI generalize respond to different situations.

When the trained AI model was transferred to the robotic hand, it was able to maintain its consistency across different conditions. Videos shared by OpenAI show the robot handle various situations such as wearing a rubber glove, having its fingers tied, or being prodded by a plush giraffe.

Comparing AI to natural evolution

OpenAI’s Rubik’s Cube–playing robotic hand shows what we can achieve with the current blends of AI. Interestingly, Dactyl’s hardware is 15 years old, and the entire innovation is in the AI software that powers it.

However, playing Rubik’s Cube is not a breakthrough, and it still displays the limits of AI. The robot drops the Cube eight times out of 10, which would classify it as clumsy by human standards.

Also, the AI techniques OpenAI used still suffer from the same limits of deep learning and neural networks. “Humans have evolved to be able to manipulate and operate our hands. There’s a huge amount of learning that has happened through evolution to get us to this point as a species. And a robot has to learn all of this from scratch,” says a video that introduces OpenAI’s Rubik’s Cube­–playing robot.

Evolution can be a suitable example to follow for artificial intelligence. OpenAI recently created a reinforcement learning environment in which AI agents developed complex behavior over hundreds of millions of trials. But we still don’t know how to replicate the functions of evolution.

Scientists are still trying to figure out how to create AI that has both innate capabilities and learning capacity as most animals do. The closest thing we have to this is transfer learning, which is a crude way of copying the weights from one neural network to another to cut down the time of training. But transfer learning still has limited uses, especially in settings such as robotics and reinforcement learning.

That’s partly why it takes the AI model 10,000 years to learn to handle the Rubik’s Cube. But I still think it’s wrong to draw an analogy between AI’s brute-force reinforcement learning and human evolution. Humans build on the innate capabilities they’ve developed over thousands of years to learn plenty of tricks during their lifetime that their ancestors didn’t know. How long will it take for Dactyl’s AI to learn a new skill, say rolling meditation balls?

Not a breakthrough for the AI and robotics industry

openai robotic hand rubiks cube
OpenAI trained an AI-powered robotic hand to solve the Rubik’s Cube (Image source: YouTube)

One of the things that OpenAI has highlighted in the paper is how the AI has enabled the robot to manipulate the Rubik’s Cube with just one hand. “Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it,” the researchers write.

This sentence can be misleading because it implies that the robot has achieved human or near-human dexterity. Humans (and other primates) have been designed to work with two hands, and they do it remarkably well. Working with a single hand puts us at a disadvantage.

In contrast, creating robots that can perform complicated tasks with two hands is much more difficult. That’s why most robots you see today have a single hand. It would make the reinforcement learning environment much more complicated. A two-handed Dactyl that can manipulate the Rubik’s Cube as a human would probably take more time to train, which shows just how different robots are from humans.

Another thing that makes OpenAI’s latest achievement misleading is the Rubik’s Cube itself. Solving the Cube is a cognitively demanding task. That’s why many people will subconsciously give extra credit to the robot for solving such a complicated problem, and with a single hand at that. But as I mentioned before Rubik’s Cube is already a solved problem.

It would be interesting to see the robot do something simpler, such as lacing a shoe or buttoning a shirt. These are tasks that require nimble fingers but are not cognitively complex. I’d doubt it would be able to solve it anytime soon.

Finally, it takes more than solving the Rubik’s Cube to create useful robots. Robotics and artificial intelligence still lack the generalization and reasoning skills required to function in open spaces. That’s why you don’t see robots roaming in homes and offices yet (except for the Roomba maybe).

For instance, I would like to see how the robot hand could operate different dishwashers and washing machines, hang the clothes to dry or grab a beer from the fridge. Those are tasks that need very little dexterity and cognitive skills but require commonsense and general knowledge about the world.

I won’t go as far as saying OpenAI hyped its latest achievement (though it does have a history of doing this). But I would caution against getting too excited about it.

Advertisements

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.