Welcome to AI book reviews, a series of posts that explore the latest literature on artificial intelligence.
Today, deep learning is the talk of the town. There is no shortage of media coverage, papers, books, and events on deep learning.
Yet deep learning is not new. Its roots go back almost to the early days of artificial intelligence and computing. While the field received the cold shoulder for decades, there were a few scientists and researchers who plodded forward, keeping faith that the idea of artificial neural networks would one day bear fruit.
And we are seeing the fruits of deep learning in everyday applications, such as search, chat, email, social media, and online shopping.
One of these scientists is Terrence Sejnowski, a pioneer in the field of computational neuroscience and a long-time researcher of artificial neural networks. In his book The Deep Learning Revolution, Sejnowski reflects on the history of the field.
In an interview with TechTalks, Sejnowski discussed the early struggles of deep learning, its explosion into the mainstream, and the lessons learned from decades of research and development.
Mapping a different path for AI
Sejnowski became interested in artificial intelligence during the heyday of symbolic AI. At the time, scientists and engineers tried to create AI by hardcoding rules and knowledge into computers.
Sejnowski was part of the “connectionist” camp, a small group of scientists who were inspired by the biological structure of natural intelligence. Connectionist AI focuses on areas such as machine learning and especially artificial neural networks. The idea behind connectionism is to replicate how large populations of neurons interact to learn representations of different things and concepts.
In the early decades of AI, symbolism received a lot of attention and connectionism fell by the wayside. Symbols are high-level and abstract representations of intelligence that are very easy to understand and codify into computer programs. Symbolic AI made quick progress on difficult logical problems such as performing complicated math and playing chess.
“If you look back at the early days of AI… the idea was that we have computers that could run rings around nature,” Sejnowski told TechTalks. “We didn’t actually have to pay attention to how nature solved the problem.”
But symbolic AI also ignores some of the important but basic details of intelligence. In Deep Learning Revolution, Sejnowski writes, “The beauty of symbols is that they allow us to compress complex ideas and manipulate them; the problem with symbols is that they are so compressed that it is difficult to ground them in the real world.”
For example, the word “chair” is a symbol that stands for all kinds of chairs, regardless of how they look, how many legs they have, whether they have armrests or not, whether they have wheels or not, or even if they are hewn in the side of a cliff. Capturing all these aspects of chairs in symbols is very difficult. Things become even more difficult when you enter fields such as vision. It is virtually impossible to create rule-based programs that can detect all kinds of chairs from different angles.
“It seemed to us that the only existing proof that any of these problems could be solved—you know, speech recognition, language, all of the really difficult problems back then that people were working on and not making much progress—was that nature had solved them,” Sejnowski said. “And so what we set out to do was to try to understand at least the basic principles that were underlying the incredible performance that not just humans but animals had in general.”
Lessons learned from nature
In the 1980s, as Sejnowski and other connectionists such as Geoffrey Hinton persisted on neural networks, they were told that they were on a fool’s errand. Symbolic AI dominated the most prestigious universities and research labs in the U.S. Other scientists believed that neural networks, an idea that was presented back in the 1950s by Frank Rosenblatt, was a dead end.
“They had good reasons,” Sejnowski said. “It wasn’t just because they didn’t like what we were doing. They were smart people. They were mathematicians. These were people who knew a lot about statistics, sample complexity, and how many examples you need in order to be able to constrain a model with a certain number of parameters. We were told that the models would overfit. We were also told by the optimization experts that this is a non-convex problem. You’re never going to be able to find an optimal solution.”
But as symbolic AI continued to hit roadblocks, deep learning started making progress, albeit small at first. Over time, the efforts of the connectionists paid off. Advances in technology enabled researchers to create very large neural networks and train them on numerous examples. It eventually became clear that deep learning models with many parameters could learn functions that were previously thought impossible. And they could solve many problems that did not have a clear solution through the symbolic approach, such as image classification and language processing.
“It wasn’t until computers got much faster and better at doing multiplies and adds that it became possible to simulate these larger networks that allowed us to explore. And what we discovered was that what the experts are telling us didn’t seem to pan out,” Sejnowski said.
Contrary to what the earlier generation of AI scientists had predicted, large neural networks did not get stuck in local minima. And they could generalize well enough to avoid the kind of overfitting that scientists had warned against. Today, deep learning has not only proven its scientific viability, but it has also become a mainstay of many major applications—though it has its own set of problems to solve.
Thinking back on the decades of neglect that deep learning received, Sejnowski said, “In retrospect, the problem here was that people didn’t have any good metrics to measure the difficulty of the problem. And the reason is that it’s all hidden from us. We are not aware of how we see. And there’s no reason why nature should burden us with that.”
The brain is a very complex and high-dimensional engine that can process many sensory data and integrate them with past experience and memory. What our intuitions tell us about how the brain works is a very abstract and low-dimensional self-made explanation that doesn’t account for all the stuff that is happening under the hood.
“The bottom line is you cannot trust your intuition. And what people in AI were trying to do was to automate by writing a program what their intuition told them intelligence was,” Sejnowski said.
Artificial intelligence meets human intelligence
As a neuroscientist, Sejnowski has very interesting observations on natural and artificial intelligence. In The Deep Learning Revolution, he writes, “The Deep Learning Revolution has two intertwined themes: how human intelligence evolved and how artificial intelligence is evolving. The big difference between the two kinds of intelligence is that it took human intelligence many millions of years to evolve, but artificial intelligence is evolving on a trajectory measured in decades. Although this is warp speed even for cultural evolution, fastening our seat belts may not be the right response.”
Interestingly, Sejnowski’s book was published in 2018, before the explosion in generative models and large language models. The things that deep learning does today are surprising even by the standards of a few years ago. Today, it seems that even “measured in decades” is an understatement.
“What we’ve seen in the last few years is a fantastic exploration that is like Lewis and Clark going into the wilderness and discovering things,” Sejnowski said. “It seems to be accelerating. But I think this is just a natural kind of trajectory that is on top of the exponential increase in computer power. There is no other technology that has grown exponentially over such a long period of time.”
At the same time, we have seen some developments that have made a great difference in the field. One of them is the invention of the transformer, the main architecture used in LLMs. Transformers are the culmination of decades of research in different areas of deep learning. They are extremely efficient in processing sequences of data, such as text, software code, patches of image pixels, or molecular data.
“Transformers were not something that I expected. I don’t think anybody expected them. It was a fantastic architecture,” Sejnowski said.
But what makes these developments especially exciting is the way they are feeding back into their origins. For much of the history of deep learning, scientists studied the brain to draw inspiration for computer models. Now, advances in deep learning are helping find new ways to study the brain.
“For me, the most exciting is that for the first time in AI, we’re getting neuroscientists talking to people, to engineers, and computer scientists in AI, back and forth, because they now have a common vocabulary,” Sejnowski said. “And in my own research, we’re using those tools to understand the brain, to analyze recordings from the brain. There’s been a tremendous revolution just within the last ten years, in terms of the tools that we have for recording from neurons, reconstructing networks, connectomics. This has really transformed neuroscience.”
Deep learning and embodiment
While deep neural networks have made impressive advances in recent years, they also have fundamental flaws that need fixing. We can see these flaws in many applications of deep learning, such as adversarial examples in computer vision systems or elementary mistakes in LLMs.
One of the explanations for the shortcomings of artificial neural networks is their lack of embodiment. For example, LLMs imitate some of the advanced features of human intelligence without the rich sensorimotor experience that humans have.
In The Deep Learning Revolution, Sejnowski explains how embodiment and constant learning are pivotal to human intelligence: “Our brains develop through a long process of maturation while interacting with the world. Learning is a process that coincides with development and continues long after we reach adulthood. Learning is therefore central to the development of general intelligence.”
In a recent paper, Sejnowski lists seven elements missing from LLMs. One of them is the lack of “direct sensory experience within the world.” But it also goes beyond that.
Other shortcomings of current AI systems, including lack of common sense and causality are also deeply linked to world experience and lifelong learning. Emotions and empathy, which often are ignored in AI, are also essential aspects of intelligence.
“Cognition and emotion have traditionally been considered separate functions of the brain. It was generally thought that cognition was a cortical function and emotions were subcortical. In fact, there are subcortical structures that regulate emotional states, structures like the amygdala, which is engaged when the emotional levels are high, especially fear, but these structures interact strongly with the cerebral cortex,” Sejnowski writes in The Deep Learning Revolution.
“These large language models are just focusing on the cortical architecture. And they’re missing all the rest of the brain, all those subcortical areas that are there for survival that were perfected long before the cognitive layer was put on top,” Sejnowski said. “And so of course, you shouldn’t expect that this thin layer on its own to be able to understand grounding.”
This is a problem that has also existed in previous generations of AI systems, which were not grounded in real-world experience and were built on logical structures. Today’s AI systems are able to replicate many human-like behaviors. However, without the foundations that organic brains have, they make mistakes that are very different from the kind of mistakes that humans make.
“What’s remarkable to me is that the language models did so well without having that foundation. And it’s really important for us to isolate and understand the differences,” Sejnowski said. “I think the differences are going to be very telling between the mistakes that large language models make and the mistakes that we make. But this is good. This is how you make progress, by identifying those differences and understanding why they’re there and what’s missing.”