*This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.*

The Game of Life is a grid-based automaton that is very popular in discussions about science, computation, and artificial intelligence. It is an interesting idea that shows how very simple rules can yield very complicated results.

Despite its simplicity, however, the Game of Life remains a challenge to artificial neural networks, AI researchers at Swarthmore College and the Los Alamos National Laboratory have shown in a recent paper. Titled, “It’s Hard for Neural Networks To Learn the Game of Life,” their research investigates how neural networks explore the Game of Life and why they often miss finding the right solution.

Their findings highlight some of the key issues with deep learning models and give some interesting hints at what could be the next direction of research for the AI community.

## What is the Game of Life?

British mathematician John Conway invented the Game of Life in 1970. Basically, the Game of Life tracks the on or off state—the life—of a series of cells on a grid across timesteps. At each timestep, the following simple rules define which cells come to life or stay alive, and which cells die or stay dead:

- If a live cell has less than two live neighbors, it dies of underpopulation.
- If a live cell has more than three live neighbors, it dies of overpopulation.
- If a live cell has exactly two or three live neighbors, it survives.
- If a dead cell has three live neighbors, it will come to life.

Based on these four simple rules, you can adjust the initial state of your grid to create interesting stable, oscillating, and gliding patterns.

For instance, this is what’s called the glider gun.

You can also use the Game of Life to create very complex pattern, such as this one.

Interestingly, no matter how complex a grid becomes, you can predict the state of each cell in the next timestep with the same rules.

With neural networks being very good prediction machines, the researchers wanted to find out whether deep learning models could learn the underlying rules of the Game of Life.

## Artificial neural networks vs the Game of Life

There are a few reasons the Game of Life is an interesting experiment for neural networks. “We already know a solution,” Jacob Springer, a computer science student at Swarthmore College and co-author of the paper, told *TechTalks*. “We can write down by hand a neural network that implements the Game of Life, and therefore we can compare the learned solutions to our hand-crafted one. This is not the case in.”

It is also very easy to adjust the flexibility of the problem in the Game of Life by modifying the number of timesteps in the future the target deep learning model must predict.

Also, unlike domains such as computer vision or natural language processing, if a neural network has learned the rules of the Game of Life it will reach 100 percent accuracy. “There’s no ambiguity. If the network fails even once, then it is has not correctly learned the rules,” Springer says.

In their work, the researchers first created a small convolutional neural network and manually tuned its parameters to be able to predict the sequence of changes in the Game of Life’s grid cells. This proved that there’s a minimal neural network that can represent the rule of the Game of Life.

Then, they tried to see if the same neural network could reach optimal settings when trained from scratch. They initialized the parameters to random values and trained the neural network on 1 million randomly generated examples of the Game of Life. The only way the neural network could reach 100 percent accuracy would be to converge on the hand-crafted parameter values. This would imply that the AI model had managed to parameterize the rules underlying the Game of Life.

But in most cases the trained neural network did not find the optimal solution, and the performance of the network decreased even further as the number of steps increased. The result of training the neural network was largely affected by the chosen set training examples as well as the initial parameters.

Unfortunately, you never know what the initial weights of the neural network should be. The most common practice is to pick random values from a normal distribution, therefore settling on the right initial weights becomes a game of luck. As for the training dataset, in many cases, it isn’t clear which samples are the right ones, and in others, there’s not much of a choice.

“For many problems, you don’t have a lot of choice in dataset; you get the data that you can collect, so if there is a problem with your dataset, you may have trouble training the neural network,” Springer says.

## The performance of larger neural networks

In machine learning, one of the popular ways to improve the accuracy of a model that is underperforming is to increase its complexity. And this technique worked with the Game of Life. As the researchers added more layers and parameters to the neural network, the results improved and the training process eventually yielded a solution that reached near-perfect accuracy.

But a larger neural network also means an increase in the cost of training and running the deep learning model.

On the one hand, this shows the flexibility of large neural networks. Although a huge deep learning model might not be the most optimal architecture to address your problem, it has a greater chance of finding a good solution. But on the other, it proves that there is likely to be a smaller deep learning model that can provide the same or better results—if you can find it.

These findings are in line with “The Lottery Ticket Hypothesis,” presented at the ICLR 2019 conference by AI researchers at MIT CSAIL. The hypothesis suggested that for each large neural network, there are smaller sub-networks that can converge on a solution if their parameters have been initialized on lucky, winning values, thus the “lottery ticket” nomenclature.

“The lottery ticket hypothesis proposes that when training a convolutional neural network, small lucky subnetworks quickly converge on a solution,” the authors of the Game of Life paper write. “This suggests that rather than searching extensively through weight-space for an optimal solution, gradient-descent optimization may rely on lucky initializations of weights that happen to position a subnetwork close to a reasonable local minima to which the network converges.”

## What are the implications for AI research?

“While Conway’s Game of Life itself is a toy problem and has few direct applications, the results we report here have implications for similar tasks in which a neural network is trained to predict an outcome which requires the network to follow a set of local rules with multiple hidden steps,” the AI researchers write in their paper.

These findings can apply to machine learning models used logic or math solvers, weather and fluid dynamics simulations, and logical deduction in language or image processing.

“Given the difficulty that we have found for small neural networks to learn the Game of Life, which can be expressed with relatively simple symbolic rules, I would expect that most sophisticated symbol manipulation would be even more difficult for neural networks to learn, and would require even larger neural networks,” Springer said. “Our result does not necessarily suggest that neural networks cannot learn and execute symbolic rules to make decisions, however, it suggests that these types of systems may be very difficult to learn, especially as the complexity of the problem increases.”

The researchers further believe that their findings apply to other fields of machine learning that do not necessarily rely on clear-cut logical rules, such as image and audio classification.

For the moment, we know that, in some cases, increasing the size and complexity of our neural networks can solve the problem of poorly performing deep learning models. But we should also consider the negative impact of using larger neural networks as the go-to method to overcome impasses in machine learning research. One outcome can be greater energy consumption and carbon emissions caused from the compute resources required to train large neural networks. On the other hand, it can result in the collection of larger training datasets instead of relying on finding ideal distribution strategies across smaller datasets, which might not be feasible in domains where data is subject to ethical considerations and privacy laws. And finally, the general trend toward endorsing overcomplete and very large deep learning models can consolidate AI power in large tech companies and make it harder for smaller players to enter the deep learning research space.

“We hope that this paper will promote research into the limitations of neural networks so that we can better understand the flaws that necessitate overcomplete networks for learning. We hope that our result will drive development into better learning algorithms that do not face the drawbacks of gradient-based learning,” the authors of the paper write.

“I think the results certainly motivate research into improved search algorithms, or for methods to improve the efficiency of large networks,” Springer said.

Interesting project – but wouldn’t it be far simpler to use a decision tree, instead of a vast neural network, to predict a given cell being alive or dead? You’d only need two predictor variables: the sum of alive cells in the 8 surrounding cells, and whether the current cell is alive or dead.

See no problem. 1-step can be easily predicted with just 1 hidden convolutional layer with 2 neurons: https://gist.github.com/failure-to-thrive/61048f3407836cc91ab1430eb8e342d9 Although it’s a mere memorization. As of multi-step predictions: it inevitable requires nested calculations of neighbors. Could anyone write an algorithm for just 2 steps similar to 1-step? 😉 No surprise that the computational complexity blows exponentially. No guilt of DNN, they do their best.

I discovered that the root cause of lottery ticket initialisation is when a neural network gets stuck in a saddle point between multiple valid linear algebra solutions.

One option to increase the size of the neural network, as it becomes increasingly difficult to create true saddle points in high-dimension systems.

The other option is to make the neural network smaller. I went as far as hand-solving a ‘2*(3×3) + 2 + 1’ neural network which had three solutions (causing lottery ticket initialisation), but was able to train the smallest network with only ‘1*(3×3) + 2 + 1’ = 11 weights, for which only a single linear algebra solution exists.

This is how I solved the lottery ticket initialisation problem.

https://www.kaggle.com/code/jamesmcguigan/its-easy-for-neural-networks-to-learn-game-of-life