DeepMind AlphaStar: AI breakthrough or pushing the limits of reinforcement learning?

StarCraft 2 screenshot
DeepMind’s AI program AlphaStar managed to defeat 99.8 percent of StarCraft II players.

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

This week, UK-based artificial intelligence lab DeepMind published impressive new results from AlphaStar, its StarCraft II–playing AI. Earlier this year, DeepMind, a subsidiary of Alphabet, first introduced the AI, which could beat world-class players under limited conditions.

Ten months later, the new and improved AlphaStar proved it can beat 99.8 percent of StarCraft II players. And it can do it under much fewer constraints than the original version. The researchers at DeepMind have developed novel techniques to solve more complicated problems with current AI technologies.

But AlphaStar’s new milestone achievement might also indicate how far we are from creating human-level artificial intelligence, DeepMind’s stated goal.

New achievements in artificial intelligence

The first version of AlphaStar was pretty remarkable. Introduced in January, the AI defeated two grandmasters of StarCraft II. However, the rules of the game were slightly “rigged” in favor of the AI. The matches took place on a single map and were limited to one of three races in the popular strategy game. AlphaStar also had a full view of the entire visible parts of the game map. Human players are limited to seeing the game through a camera view.

The rules might sound unfair, but the achievement is still remarkable. AlphaStar uses reinforcement learning, a branch of AI that develops its behavior through massive trial and error. The more open-ended a problem, the longer it takes for a reinforcement learning AI to master it.

And compared to the board games artificial intelligence had previously mastered, StarCraft was a real challenge. It’s a real-time strategy game, as opposed to turn-based, such as chess and Go. It is also a game of imperfect information; the players can’t see everything that’s happening across the map.

To put that in perspective: In chess, there’s an average of 400 possible moves at every turn. In Go, the possibilities rise to 130,000. In StarCraft II, there are approximately 1026 possible moves at every given moment.

Even with all the constraints, it took AlphaStar tens of millions of training games to reach grandmaster level in StarCraft II.

According to DeepMind, the new version of the AI can play all three races in the game (Protoss, Terran, and Zerg). The AI also plays on the same maps and conditions as humans on Blizzard’s servers. The AI’s field of view is equal to that of a human player (a display screen’s worth of map data) instead of seeing the entire map. And the number of actions (click, scroll, select, etc.) has been reduced to 22 per five seconds, roughly equivalent to that of a pro player.

Every single one of these new factors makes the game considerably more difficult for the AI. And, remarkably, DeepMind’s AI has been able to master it.

How did AlphaStar master the game?

DeepMind’s success in beating unconstrained rules is largely due to the huge compute resources of its parent company. For the moment, successful AI models are those that can scale with the availability of data and computation. Therefore, if you can throw more matches and more powerful processors at the problem, you’ll probably be able to see new and better results.

DeepMind has also created a neural network for each of the different race types. Effectively, this means that it is training a separate AI model for each of the races. Again, Google’s vast arsenal of TensorFlow processors made this possible.

But the AI algorithms powering AlphaStar’s new version also have some interesting innovations. AlphaStar uses self-play, a reinforcement learning technique in which the AI agent plays agents itself numerous times. Self-play was the technique AlphaZero, DeepMind’s monumental Go-playing AI, used. Self-play enables AI agents to learn at fast rates without the need for human input, which can be costly (from where are you going to get data from hundreds of millions of played games?).

However, self-play has its disadvantages. AI models built on self-play tend to develop illogical behavior. They also become very brittle, performing optimally under specific conditions, and breaking with the slightest change to the environment. DeepMind uses multiple AI agents whose goal is not only to win but to also grow stronger by discovering flaws in opponents. This technique enables the AI to develop a more generalizable behavior.

DeepMind AlphaStar Self-play mechanism
Source: DeepMind

AlphaStar also uses imitation learning, observing human replays to develop behavior for exploring StarCraft’s complex environment.

Some caveats and considerations on AlphaStar

In the blog post that introduced AlphaStar’s new version, DeepMind’s AI researchers state, “Ultimately, these results provide strong evidence that general-purpose learning techniques can scale AI systems to work in complex, dynamic environments involving multiple actors.”

While the new version of AlphaStar is remarkable, I wouldn’t call it a breakthrough toward artificial general intelligence. It’s more akin to the incremental advances we’ve seen in recent years.

For one thing, unlike AlphaZero, which mastered Go with zero input from humans, AlphaStar still relies heavily on human data. The AI researchers have made it clear in several parts of their paper that AlphaStar needs data from human-played games. This is significant because it shows that current AI techniques still lack the capabilities to explore open environments.

Also, the AI researchers did not clearly state the format in which the game receives the information. According to both the blog post and accompanying paper, the AI uses a “camera-like interface” to analyze the game map. But this does not mean that AlphaStar views the same sophisticated graphics that humans do. Processing that kind of data would require extra efforts in computer vision.

Instead, AlphaStar gets raw game data, though limited to the amount it would see had it been looking through a camera. I wouldn’t exactly call it cheating. Just as we can’t expect humans to play StarCraft by staring at a table of raw game data, we shouldn’t expect AI to necessarily master the game by staring at graphics tailored for the human visual system.

artificial neural network

It does, however, highlight the stark differences between contemporary AI and human intelligence. Had AlphaStar been learning to play StarCraft by processing its in-game graphics, it would have to solve various new challenges. For instance, the slightest change to the coloring or shape of units would make them undetectable for the AI. AlphaStar would have to go through extra training cycles on different variations of the graphics.

Also, AlphaStar might be able to beat all but the few best players in the world, but it still lacks many skills that novice StarCraft players have. For instance, anyone who has played StarCraft II to a modest level will be able to jump in a similar game such as Warcraft or Command & Conquer and quickly become comfortable with the gaming environment. But for DeepMind’s super-powerful AI, every new game is a new challenge that requires millions of training matches.

AI researcher and cognitive scientist Gary Marcus has published a set of tests for evaluating the behavior of the AI. These would be useful benchmarks to grade future iterations of AlphaStar and similar AI models.

In conclusion, I would say that AlphaStar is an outstanding achievement for the field of artificial intelligence, especially reinforcement learning. The results of game-playing AI research will surely help in domains such as robotics and resource management. The San Francisco–based OpenAI has used similar AI techniques to develop cutting-edge robotic hands. DeepMind’s AlphaStar will also find its real-world use cases soon.

But while AlphaStar shows us that we can still push the limits of AI, it also reminds us of the challenges we must overcome to replicate the human brain, nature’s most complex product.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.