AI’s struggle to reach “understanding” and “meaning”

9 min read
human mind thoughts
Current AI systems can classify images, but perform very poorly when they must (Image credit: Depositphotos)

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

The short excerpt below from the 1938 film La Femme du Boulanger (The Baker’s Wife) ingeniously depicts how the human mind can extract deep meaning from life experiences and perceived situations. In the movie, directed by Marcel Pagnol, the baker Aimable welcomes his wife Aurelie, who has just come back after running off with a shepherd days earlier.

While Aimable treats Aurelie with sweet words and a heart-shaped bread (which he had baked for himself), he shows no kindness toward Pomponette, his female cat who coincidentally returns home at the same time as Aurelie, after abandoning her mate Pompon for a chat de gouttière (alley cat). Aimable calls Pomponette ordur (junk) and a salope (a rude term) who has run off with un inconnu (a nobody) and bon-a-rien (good for nothing) while the poor Pompon has been miserably searching for her everywhere.

While Aimable cuts the cat down to size with his tongue-lashing, Aurelie cringes in her seat and starts to sob.

“What does he have better than [Pompon]?” Aimable asks. “Rien (nothing),” answers Aurelie in a shaky voice, barely above a whisper. It’s not clear whether she’s talking about the stray alley cat or the shepherd boy.

“You say rien,” Aimable tells his wife in a sweet and soft voice. “But if she could talk,” he says, his voice becoming stern again as he returns his gaze back to Pomponette, “if she had no shame, if she didn’t fear to pain pauvre Pompon, she would say, ‘He is prettier.’” Again, there are many hidden meanings and accusations in his words.

As Aimable is rambling on, apparently oblivious to his wife’s reaction, Pomponette starts drinking milk from Pompon’s bowl. And that’s where he drives the dagger. “Look there,” he says. “This is why she returned. She was cold and hungry.”

Meanwhile, Aurelie is holding the heart-shaped bread that Aimable had baked—for himself.

Most humans can extract the deep meanings, metaphors, and intricate nuances hidden in the flaky grayscale image frames and noisy sound waves of this video sequence. We can empathize with Aimable and Aurelie (and map them to our own previous life experiences).

But the most advanced artificial intelligence technology we have today—our best imitation of the brain—can at best see people and faces, detect genders and objects, and provide very basic descriptions such as “a couple dining at a table.”

This is just a glimpse the human mind’s ability to understand the world—and how great a challenge its replication remains after six decades of artificial intelligence research.

“Humans are able to ‘actually understand’ the situations they encounter, whereas even the most advanced of today’s AI systems do not yet have a human-like understanding of the concepts that we are trying to teach them,” writes computer scientist and AI researcher Melanie Mitchell in her latest paper for AI Magazine.

In her paper, Mitchell, who is also a professor at Santa Fe Institute and the author of a recent book on artificial intelligence, discusses the struggles of current AI systems, namely deep learning, in extracting meaning from the information they process.

Deep learning is very good at ferreting out correlations between tons of data points, but when it comes to digging deeper into the data and forming abstractions and concepts, they barely scratch the surface (even that might be an overstatement). We have AI systems that can locate objects in images and convert audio to text, but none that can empathize with Aurelie and appreciate her unease when her husband attacks Pomponette. In fact, our AI systems start to break as soon as they face situations that are slightly different from the data they’ve been trained on.

Some scientists believe that such limits will be overcome as we scale deep learning systems with larger neural networks and bigger datasets. But, Mitchell suspects, something more fundamental might be missing.

In 2018, Mitchell helped organize a three-day workshop at the Santa Fe Institute titled “Artificial Intelligence and the Barrier of Meaning.” The workshop explored concepts such as what is “meaning” and “understanding,” how to extract meaning from data and experience, and how understanding situations can help create AI systems that can generalize their abilities and are more robust to changes in their environment.

The result of the workshop, which Mitchell shares in her paper, gives some directions on how we can make more reliable AI systems in the future.

AI lacks innate abilities

Human mind vs artificial intelligence

Like the term “artificial intelligence,” the notions of “meaning” and “understanding” are hard to define and measure. Therefore, instead of trying to give the terms a formal definition, the participants in the workshop defined a list of “correlates,” abilities and skills closely tied to our capacity to understand situations. They also examined to what extent current AI systems enjoy these capacities.

“Understanding is built on a foundation of innate core knowledge,” Mitchell writes. Our basic understanding of physics, gravity, object persistence, and causality enable us to trace the relations between objects and their parts, think about counterfactuals and what-if scenarios, and act in the world with consistency. Recent research indicates that intuitive physics and causal models play a key role in our understanding of visual scenes, and scientists have described it as one of the key components of the “dark matter” of computer vision.

Beyond physics, humans also have “innate or early-developed intuitive psychology,” Mitchell writes, which gives us the ability to analyze, empathize, and communicate with other social beings. Mitchell also speaks of “metacognition,” the ability to “explain and predict our own thought processes and decisions, and map them onto the thought processes of others.” These capabilities are essential for us to develop an idea of the scope of information we have and how relevant it is to solving problems. It also allows us to put ourselves in Aurelie’s shoes and imagine her feelings as she watches Aimable lash out at Pomponette.

Neural networks can’t extrapolate

Compared to humans, deep neural networks need much more data to learn new things. This is because, while neural networks are efficient at interpolating between data points they’ve seen during training, they’re terrible at dealing with situations not covered by their training data. Humans, on the other hand, are good at extrapolating their knowledge and experience to previously unseen situations because they “build abstract representations,” Mitchell writes. Abstraction is a powerful tool of the human mind. It’s what allows us to extract the high-level meanings of the movie excerpt we saw at the beginning of this article and compare them with things we already know.

And unlike neural networks, which have a different training and deployment process, the human brain is an active learning machine that continues to adjust its knowledge throughout its entire life. “Perception, learning, and inference are active processes that unfold dynamically over time, involve continual feedback from context and prior knowledge, and are largely unsupervised,” Mitchell writes.

The AI and neuroscience community is divided on how the human mind acquires knowledge efficiently. Many scientists believe that the brain comes prewired with many capabilities. These innate capabilities, which we mostly take for granted, enable us to make sense of situations we’ve never seen before and to learn things with very few examples. Others researchers assert that like artificial neural networks, the brain is a large interpolation machine that learns to fill the gaps between known data, and we need to discover the secret algorithm that makes us efficient at extracting meaning from the world.

“I don’t think anyone knows the answer to this,” Mitchell told TechTalks in written comments. “I’m not even sure it’s an either/or—we likely have prewired capabilities in the brain that guide our early self-supervised learning.  We also probably have some prewired ‘facts’ about the world, such as how to identify that something is an ‘object.’”

Evolutionary considerations

dna science research

Another area explored at the Santa Fe workshop was the need for AI systems to have a body to experience the world. “Understanding in living systems arises not from an isolated brain but rather from the inseparable combination of brain and body interacting in the world,” Mitchell writes, adding that the supporters of this hypothesis believe that a disembodied brain will not achieve human-like understanding.

“I think if you asked the people at the workshop, there would have been a lot of difference in opinion on what ‘embodiment’ means,” Mitchell told me. “But it certainly includes the ability to actively ‘sense’ the world in some form or another, emphasis on the ‘actively.’ I don’t think anyone can say that there is a single kind of ‘embodiment’ that is necessary for general intelligence.”

Evolution has also played a key role in shaping the mind of every living being to serve its physical needs. “Over the last decades evidence has emerged from neuroscience, psychology, and linguistics that supports the essential role of the body in virtually all aspects of thinking,” Mitchell writes. For instance, while chimpanzees are obviously less intelligent than humans, they have a much better short-term memory. Likewise, the minds of squirrels have evolved to remember thousands of food hideouts.

These are cognitive abilities that have developed over thousands and millions of generations and repeated interactions with the environment. “Perhaps the particular underlying structure of the brain is not as central to understanding as the evolutionary process itself,” Mitchell observes in her paper, adding that an evolutionary approach might open a path forward toward integrating meaning and understanding in AI systems.

In this respect, one of the benefit of artificial intelligence is that, where simulated environments allow, it can play evolutionary cycles in fast forward.

Understanding is not a loss function or a benchmark

Machine learning algorithms are designed to optimize for a cost or loss function. For instance, when a neural network undergoes training, it tunes its parameters to reduce the difference between its predictions and the human-provided labels, which represent the ground truth. This simplistic approach to solving problems is not what “understanding” is about, the participants at the Santa Fe Institute workshop argued. There’s no single metric to measure the level of understanding.

It’s unclear what should be “optimized” to achieve the correlates of understanding or “even if optimization itself is the right framework to be using,” Mitchell writes in her paper.

Another problem that plagues the AI community is the narrow focus on optimizing algorithms for specific benchmarks and datasets. In the past decade, many datasets have emerged that contain millions of examples in areas such as computer vision and natural language processing. These datasets allow AI researchers to train their algorithms and test their accuracy and performance. But while the hard work that have gone into curating these datasets is commendable and has contributed much to many advances we’ve seen in AI in the past years, they have also ushered in a culture that creates a false impression of achievement.

“Due to the incentives the field puts on successful performance on specific benchmarks, sometimes research becomes too focused on a particular benchmark rather than the more general underlying task,” Mitchell writes in AI Magazine.

When scoring higher on dataset becomes the goal, it can lead to detrimental results. For instance, in 2015, a team of AI researchers from Baidu cheated to score higher than other competitors at ImageNet, a yearly computer vision competition. Instead of finding a novel algorithm that could classify images more accurately, the team managed to find a way to game the benchmark in violation of the contest’s rules.

The shortcomings of narrowly curated datasets have also become the highlight of more recent research. For instance, at the NeurIPS 2019 conference, a team of researchers at the MIT-IBM Watson AI Lab showed that algorithms trained on the ImageNet dataset performed poorly in real-world situations where objects are found in uncommon positions and lighting conditions.

ImageNet images vs ObjectNet images
ImageNet vs reality: In ImageNet (left column) objects are neatly positioned, in ideal background and lighting conditions. In the real world, things are messier (source:

“Many of the papers published using ImageNet focused on incremental improvement on the all

Important ‘state of the art’ rather than giving any insight into what these networks were actually recognizing or how robust they were,” Mitchell writes.

Recently, there’s been a push to develop benchmarks and datasets that can better measure the general problem-solving capabilities of AI algorithms. A notable effort in this respect is the Abstract Reasoning Corpus developed by Keras founder Francois Chollet. ARC challenges AI researchers to develop AI algorithms that can extract abstract meaning from data and learn to perform tasks with very few examples.

“I agree with Chollet that abstraction and analogy—of the kind required in solving the ARC problems—are core aspects of intelligence that are under-studied in today’s AI research community, and that to make progress on the issues I outline in the ‘Crashing the Barrier of Meaning’ paper, we’ll have to figure out how to get machines to be able to do this kind of task,” Mitchell said in her comments to TechTalks. “But even if a machine could solve the ARC problems, it remains to be seen if it could use the same mechanisms to deal with abstraction and analogy in the real world, especially where language is concerned.”

Abstraction Reasoning Corpus problem
The Abstraction Reasoning Corpus (ARC), introduced by AI scientist François Chollet, tests intelligence systems with few training examples. (Source:

Finding meaning is an interdisciplinary challenge

“Our limited conception of what understanding actually involves makes it hard to answer basic questions: How do we know if a system is ‘actually understanding’? What metrics can we use? Could machines be said to ‘understand’ differently from humans?” Mitchell writes in her paper.

What made this specific study interesting was the broad range of perspectives brought together to tackle this complicated topic. Participants in the workshop came from various disciplines, including AI, robotics, cognitive and developmental psychology, animal behavior, information theory, and philosophy, among others.

“When I first got into AI, there was a real interdisciplinary feel to it. AI people attended cognitive science conferences, and vice versa. Then statistics took over AI, and the field got less diverse,” Mitchell said. “But I see a trend now in the field returning to its interdisciplinary roots, which I think is a very positive development.”

The paper includes many examples from studies in fields other than computer science and robotics, which help appreciate the depth of meaning in living beings.

“For me the perspectives from people outside AI (in psychology, neuroscience, philosophy, etc.) helped show how these issues of ‘understanding’ and ‘meaning’ are simultaneously key to intelligence, but also very hard to study,” Mitchell told me. “Listening to people from psychology and neuroscience really drove home how complex intelligence is, not only in humans but also in other animals ranging from jumping spiders to grey parrots to our primate cousins. And also that we really don’t understand natural intelligence very well at all.”



Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.