This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.
From ancient mythology to modern science fiction, humans have been dreaming of creating artificial intelligence for millennia. But the endeavor of synthesizing intelligence only began in earnest in the late 1950s, when a dozen scientists gathered in Dartmouth College, NH, for a two-month workshop to create machines that could “use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.”
The workshop marked the official beginning of AI history. But as the two-month effort—and many others that followed—only proved that human intelligence is very complicated, and the complexity becomes more evident as you try to replicate it.
That is why, despite six decades of research and development, we still don’t have AI that rivals the cognitive abilities of a human child, let alone one that can think like an adult. What we do have, however, is a field of science that is split into two different categories: artificial narrow intelligence (ANI), what we have today, and artificial general intelligence (AGI), what we hope to achieve.
What are the requirements of AGI?
Defining artificial general intelligence is very difficult. “General” already implies that it’s a very broad term, and even if we consider human intelligence as the baseline, not all humans are equally intelligent.
But there are several traits that a generally intelligent system should have such as common sense, background knowledge, transfer learning, abstraction, and causality. These are the kind of functions you see in all humans since early age.
The best way to see what a general AI system could do is to provide some challenges:
Challenge 1: What would happen in the following video if you removed the bat from the scene?
This is a challenge that requires the AI to have an understanding of physical dynamics, and causality. It should also be able to reason about counterfactuals, alternative scenarios where you make changes to the scene.
Challenge 2: Consider the following text, mentioned in Rebooting AI by Gary Marcus and Ernest Davis: “Elsie tried to reach her aunt on the phone, but she didn’t answer.” Now answer the following questions:
- Who didn’t answer the phone?
- Were Elsie and her aunt in the same room?
This challenge requires the AI to have basic background knowledge about telephone conversations. Most people know about remote communications and how telephones work, and therefore they can infer many things that are missing in the sentence, such as the unclear antecedent to the pronoun “she.”
Challenge 3: Enter a random house and make a cup of coffee. Add some milk and sugar.
This challenge will require the AI agent to have a general understanding of houses’ structures. It should have basic knowledge such as the following: Food items are usually found in the kitchen. The kitchen is usually located on the first floor of the home. Coffee is stored in the cupboard. Milk has to be kept in the refrigerator. The AI must locate the coffeemaker, and in case there isn’t one, it must be able to improvise. Most humans solve these and dozens of other problems subconsciously.
Challenge 4: Try to guess the next image in the following sequence, taken from François Chollet’s ARC dataset.
While very simple and straightforward, solving these challenges in a general way is still beyond today’s AI systems.
Following are two main approaches to AI and why they cannot solve artificial general intelligence problems alone.
Symbolic AI vs general AI
The early efforts to create artificial intelligence focused on creating rule-based systems, also known as symbolic AI. Symbolic AI is premised on the fact the human mind manipulates symbols. We have mental representations for objects, persons, concepts, states, actions, etc. and we use these representations (the symbols) to process the information we receive through our senses, to reason about the world around us, form intents, make decisions, etc.
Symbolic AI systems made early progress. Computer programming languages have been created on the basis of symbol manipulation. Classes, structures, variables, functions, and other key components you find in every programming language has been created to enable humans to convert symbols to computer instructions.
But symbolic AI has some fundamental flaws. It only works as long as you can encode the logic of a task into rules. But manually creating rules for every aspect of intelligence is virtually impossible. In the 1980s, AI scientists tried this approach with expert systems, rule-based programs that tried to encode all the knowledge of a particular discipline such as medicine. Expert systems were successful for very narrow domains but failed as soon as they tried to expand their reach and address more general problems. They also required huge efforts by computer programmers and subject matter experts.
Another problem with symbolic AI is that it doesn’t address the messiness of the world. An example is detecting objects in an image. Computers see visual data as patches of pixels, numerical values that represent colors of points on an image. Each object in an image is represented by a block of pixels. The naïve approach to solving this problem with symbolic AI would be to create a rule-based system that compares the pixel values in an image against a known sequence of pixels for a specific object. The problem with this approach is that the pixel values of an object will be different based on the angle it appears in an image, the lighting conditions, and if it’s partially obscured by another object.
Consider, for instance, the following set of pictures, which all contain basketballs. It is clear in the images that the pixel values of the basketball are different in each of the photos. In some of them, parts of the ball are shaded with shadows or reflecting bright light. In some pictures, the ball is partly obscured by a player’s hand or the net. Yet in others, the lines and writings appear in different angles. And the ball’s size changes based on how far it is from the camera.
To solve this problem with a pure symbolic AI approach, you must add more rules: Gather a list of different basketball images in different conditions and add more if-then rules that compare the pixels of each new image to the list of images you have gathered. But there are virtually infinite ways a basketball can appear in a photo, and no matter how many images you add to your database, a rigid rule-based system that compares pixel-for-pixel will fail to provide decent object recognition accuracy.
And mind, this is a basketball, a simple, spherical object that retains its shape regardless of the angle. Now imagine a more complex object, such as a chair, or a deformable object, such as a shirt. The complexity of the task will grow exponentially.
The history of AI and the study of human intelligence shows that symbol manipulation is just one of several components of general AI.
General AI vs machine learning
Almost in parallel with research on symbolic AI, another line of research focused on machine learning algorithms, AI systems that develop their behavior through experience.
While machine learning algorithms come in many different flavors, they all have a similar core logic: You create a basic model, tune its parameters by providing it training examples, and then use the trained model to predict, classify, or generate new data.
The most popular branch of machine learning is deep learning, a field that has received a lot of attention (and money) in the past few years. At the heart of deep learning algorithms are deep neural networks, layers upon layers of small computational units that, when grouped together and stacked on top of each other, can solve problems that were previously off-limits for computers.
Neural networks are especially good at dealing with messy, non-tabular data such as photos and audio files. In recent years, deep learning has been pivotal to advances in computer vision, speech recognition, and natural language processing.
To return to the object-detection problem mentioned in the previous section, here’s how the problem would be solved with deep learning: First you create a convnet, a type of neural network that is especially good at processing visual data. Then, you train the AI model on many photos labeled with their corresponding objects. Finally, you test the model by providing it novel images and verifying that it correctly detects and labels the objects contained in them.
Instead of doing pixel-by-pixel comparison, deep neural networks develop mathematical representations of the patterns they find in their training data. Compared to symbolic AI, neural networks are more resilient to slight changes to the appearance of objects in images.
But does deep learning solve the general AI problem? Certainly not. Neural networks have so far proven to be good at spatial and temporal consistency in data. But they are very poor at generalizing their capabilities and reasoning about the world like humans do.
A well-trained neural network might be able to detect the baseball, the bat, and the player in the video at the beginning of this article. But it will be hard-pressed to make sense of the behavior and relation of the different objects in the scene. Neural networks also start to break when they deal with novel situations that are statistically different from their training examples, such as viewing an object from a new angle.
A huge language model might be able to generate a coherent text excerpt or translate a paragraph from French to English. But it does not understand the meaning of the words and sentences it creates. What it’s basically doing is predicting the next word in a sequence based on statistics it has gleaned from millions of text documents.
Also, without any kind of symbol manipulation, neural networks perform very poorly at many problems that symbolic AI programs can easily solve, such as counting items and dealing with negation. Neural networks lack the basic components you’ll find in every rule-based program, such as high-level abstractions and variables. That is why they require lots of data and compute resources to solve simple problems.
In a nutshell, symbolic AI and machine learning replicate separate components of human intelligence. But it is evident that without bringing together all the pieces, you won’t be able to create artificial general intelligence.
When will artificial general intelligence become reality?
Scientists and experts are divided on the question of how many years it will take to break the code of human-level AI. But most agree that we’re at least decades away from AGI.
Nonetheless, as is the habit of the AI community, researchers stubbornly continue to plod along, unintimidated by six decades of failing to achieve the elusive dream of creating thinking machines.
Today, there are various efforts aimed at generalizing the capabilities of AI algorithms. Again, like many other things in AI, there are a lot of disagreements and divisions, but some interesting directions are developing.
Some scientists believe that the path forward is hybrid artificial intelligence, a combination of neural networks and rule-based systems. The hybrid approach, they believe, will bring together the strength of both approaches and help overcome their shortcomings and pave the path for artificial general intelligence.
Other scientists believe that pure neural network–based models will eventually develop the reasoning capabilities they currently lack. There is a lot of research on creating deep learning systems that can perform high-level symbol manipulation without the explicit instruction of human developers. Other interesting work in the area is self-supervised learning, a branch of deep learning algorithms that will learn to experience and reason about the world in the same way that human children do.
Will any of these approaches eventually bring us closer to AGI, or will they uncover more hurdles and roadblocks? Time will tell. But what’s for sure is that there will be a lot of exciting discoveries along the way.