AI made history on Saturday as neural networks defeated human world champions in a best-of-three contest at Dota 2, a popular and complex online strategy game. OpenAI Five, the AI agent developed by the namesake research lab, managed to perform the feat after losing in a match-up against professional Dota 2 players in a famous tournament in August.
Games have historically been one of the main arenas to test the progress of artificial intelligence algorithms. Initial efforts involved developing AI models that could play board games such as chess and checkers, and eventually the complicated Chinese game of Go. More recently, AI researchers have turned their attention toward video games, which are significantly more complicated and challenging.
In this regard, OpenAI Five’s victory marked a milestone achievement for the AI community.
In Dota 2, two teams of five players compete against each other. The ultimate goal of the game is to destroy the enemy’s tower, known as the “Ancient, while preventing the destruction of your own. Each player can play one of the many “hero” characters the game features. Mastering Dota 2 requires combat tactics, resource management, using special skills and developing long-term strategies. The game is played in real time (as opposed to turn-based), which makes it all the more difficult.
In a nutshell, Dota 2 is a game that is easy to learn, hard to master.
OpenAI has yet to release the technical details of its Dota 2–playing AI. But it has released some initial information about how the model has developed its ability to play the game. The new AI is an advanced version of the original model, which OpenAI introduced in June 2018.
OpenAI Five is a team of five neural networks (hence the name), one for each of the five hero characters in a team. Neural networks are software constructions that develop their behavior by analyzing large data sets and finding correlations and patterns.
OpenAI Five trains its neural networks by using reinforcement learning, a subset of AI where the model is given the rules of the environment and a reward to pursue. The AI is then left to its own devices to try different combinations and figure out successful sequences that can maximize the reward.
Reinforcement learning is one of the most advanced subsets of deep learning and is the main method used in other game-playing AI models such as DeepMind’s AlphaZero and AlphaStar.
In the case of Dota 2, OpenAI’s neural networks must find combinations that will help them move toward the many small and large goals of the game, such as gathering resources, making their heroes stronger, destroying enemy heroes and destroying the enemy team’s Ancient.
Comparing OpenAI Five to human players
While at first glance, reinforcement learning roughly mimics the way humans learn to play games, beneath the surface it is very different. Neural networks take in huge amounts of data, much more than a human needs to master a game. They also need a lot of compute power.
OpenAI Five trained on 45,000 years’ worth of games in ten months, consuming 800 petaflops per second. To put that in perspective, an Intel Core i7–970, which is one of the most powerful PC processors, averages at 109 gigaflops per second.
The AI’s playing style is also different from that of human players. In the first game against the team of human champions, OpenAI Five used tactics that seemed peculiar. For instance, the AI used in-game currency to immediately revive dead heroes, even early in the game, something that professional players don’t usually do.
According to OpenAI CTO Greg Brockman, the AI favors strategies that yield short-term gains. This shows OpenAI’s shortcomings in long-term strategizing and planning, a characteristic that human players acquire with little training. However, these same short-term tactics helped OpenAI beat human champions.
We see this happen in test games all the time: the bots buy back, the humans laugh, and then the humans lose. Hard to know if it’ll happen here too…
— Greg Brockman (@gdb) April 13, 2019
But the entire comparison of AI and human intelligence is flawed, as some experts point out. “Humans are cheating to some degree as [Dota] was designed with an average human in mind. Humans have (quite inefficiently) evolved over a long period of time to be highly effective at many of the tasks demanded from a game of DotA,” says AI researcher Stephen Merity, in written comments to TechTalks.
Humans already have an understanding of the many concepts in Dota, such as combat, defense, resource planning, cooperation, and more. AI models start with a clean slate and with zero knowledge.
“Machine learning algorithms come in to this with relatively few preconceptions about the task. The 45,000 years of training here is obviously a great deal, but the model is learning a complex set of subgoals and sub-objectives that eventually result in the model winning or losing,” Merity says.
The limits of artificial intelligence in playing Dota 2
Despite the vast amount of resources OpenAI has at its disposal to train its neural network, it is still very hard to create an AI that play perfect Dota 2 with all its various parameters. That’s why OpenAI introduced limits to the game to make it a little easier for the AI.
Out of the 117 different characters available in the game, OpenAI limited the competition to 17 characters. Given that each game involves ten heroes, this reduces the number of possibilities from approx. 89 trillion (117 choose 10) to 19,448 (17 choose 10).
“The combinatorial possibilities are the main reason why DotA is fascinating to human players. Whilst OpenAI Five is certainly a success, the fact that the game would likely still fall to an amateur player when it had to play on ‘real ground’ (i.e. all character combinations) is still a major limit,” Merity says.
It’s worth nothing that our calculation has not taken into account the different strengths and weaknesses each character type has and how that would affect the training of the network. Current AI technologies are not good at learning abstract concepts and transferring knowledge to new situations. If we see something new in the game, like a character we haven’t seen before, we can quickly make decisions based on our previous experience and knowledge. For AI, a new character is almost like a new game that it has to learn from scratch. That’s why changes to the game’s parameters require a huge amount of training to raise the AI to professional level again.
According to OpenAI, they had to remove one of the characters from the competition because its abilities had changed in a recent update to the game.
“The models produced by OpenAI Five are still not flexible compared to standard human competition, where patches that change character abilities do sometimes come at very inconvenient times,” Merity observes.
The strengths of artificial intelligence in playing Dota 2
While we examine the weaknesses of AI’s gameplaying skills, it’s also important to underline its strengths. In 2018, OpenAI Five lost to human champions. It turned out that the AI needed more training.
“We were expecting to need sophisticated algorithmic ideas, such as hierarchical reinforcement learning, but we were surprised by what we found: the fundamental improvement we needed for this problem was scale,” OpenAI notes in its blog post.
So while AI can’t mimic humans’ abstract thinking and commonsense, it can perform its own type of “thinking” and “learning” at a very fast pace. Training OpenAI Five in super-fast forward for ten months brought it to the level of champions. According to OpenAI, the new model wins against the old AI in 99.9 percent of games.
“Like Deep Blue’s chess, this is a problem where you can throw substantial compute at generating different scenarios,” says Merity, referring to the AI that won against world chess champion Garry Kasparov in 1997. “Indeed no human could have manually curated or annotated those many centuries of gameplay!”
OpenAI Five’s evolution into a champion Dota 2–playing bot is a reminder that so far, successful AI methods are those that can scale as data and compute resources become increasingly available. While this is not an approach that works in all scenarios, it surely helps in areas like playing games, where the AI has to explore and compare a large number of different scenarios and combinations.
Cooperation between AI and humans
One of the interesting features of Saturday’s event was the cooperation between humans and AI. After the human vs AI competition, OpenAI set up a match in which each team was comprised of two human players and three bots.
“Our testers reported feeling supported by their bot teammates, that they learned from playing alongside these advanced systems, and that it was generally a fun experience overall,” OpenAI notes, further explaining that the experience “presents a compelling vision for the future of human-AI interaction, one where AI systems collaborate and enhance the human experience.”
What was more interesting was that OpenAI Five pulled the feat without any special configuration. The AI had been modeled to only work with copies of itself but managed to adapt to cooperating with human players without further training.
But the cooperation match also shows the challenges of bringing AI and humans together. As Merity notes, the match highlighted many flaws and shortcomings. “There was no clear way for humans to cooperate with the bots. They couldn’t coordinate strategy,” he says. This means the humans can’t predict or direct the moves made by the AI and can only hope that the bots find a way to blend in with their strategy.
Okay, there's some degree of cooperation, but this feels very much like a bot taking pity on a human, like when my cat puts a dead mouse on my pillow as he thinks I'm too thin and don't spend enough time hunting for myself 🤣https://t.co/blXjUzsv8R
— Smerity (@Smerity) April 13, 2019
This would sometime result in awkward situations. “The bots would give up on humans and leave them to fight their own battles whilst going off elsewhere. It seemed a relatively lonely cooperative match,” Merity notes.
The AI and human players also had no effective method to communicate.
OpenAI will be launching an event called “OpenAI Five Arena” on Thursday through Sunday, where they will let anyone play Dota 2 against and alongside the champion AI.
“Seeing how humans interact when they have the opportunity to play alongside the bots in the short OpenAI Five ‘live’ period will be interesting to see,” Merity says.
What OpenAI Five tells us about the future of AI
There’s no point in spending immense expensive resources to teach AI to play games if it doesn’t serve real-world use cases. OpenAI employed the same model used in first version of Five to teach a robotic hand to handle objects by itself using reinforcement learning. It will be interesting to see what the updated AI model will achieve.
But while games are good arenas to train AI models, the real world is much more complex. “Unfortunately a sufficiently complex and interesting simulator for real world events is still quite rare,” Merity says.
OpenAI was founded with the goal of creating artificial general intelligence, the kind of AI that can replicate the general-problem-solving functionalities of the human mind. So far, there’s still no technology that comes close to mimicking human-level intelligence.
“What OpenAI is trying to do is build general artificial intelligence and to share those benefits with the world and make sure it’s safe,” OpenAI CEO Sam Altman told The Verge after Saturday’s event. “We’re not here to beat video games, as fun as that is. We’re here to uncover secrets along the path the AGI.”
But it’s not clear whether teaching AI to master Dota 2 has moved us closer to this goal, partly because we still don’t know what artificial general intelligence is, and there are still many unanswered questions about the human brain.
“We have no clue what mastery of one game may give us in terms of intelligence. Does it take wisdom to play a game of chess well? Does it take wisdom to play a game of DotA or StarCraft well? We’re still waiting to see,” Merity says.