To understand language models, we must separate “language” from “thought”

Thinking inward
Image source: 123RF

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

The conversation around large language models (LLM) is becoming more polarized with the release of advanced models such as ChatGPT. Arguments range from LLMs being “thinking machines” to being stupid programs that stitch together memorized text.

To clear out the confusion, we need a different framework to think about LLMs, argue researchers at the University of Texas at Austin and Massachusetts Institute of Technology (MIT). In a paper titled “Dissociating language and thought in large language models: a cognitive perspective,” the researchers argue that to understand the power and limits of LLMs, we must separate “formal” from “functional” linguistic competence.

LLMs have made impressive advances on the former, but still have a lot of work to do on the latter, the researchers say. And this distinction can help clarify the discourse surrounding LLMs and find paths to build models that “understand and use language in human-like ways.”

Two common fallacies about language models

“As with anything, I think the way people view LLMs is influenced by their own background, training, and experience,” Kyle Mahowald, co-author of the paper and computational linguist at UT Austin, told TechTalks. “It’s exciting that LLMs are attracting interest from so many corners of academia: not just the NLP world, but also linguistics, neuroscience, philosophy, anthropology, sociology, political science, etc. This naturally leads to a diversity of opinions on LLMs and their capabilities. Of course, the same is true for us. That’s part of why we put ‘a cognitive perspective’ in the title of our paper.”

In the paper, the researchers explore two common fallacies related to language and thought. The first claims that an entity that is good at language is also good at thinking, which the scientists describe as “good at language -> good at thought” fallacy. This fallacy leads to the kind of arguments that large language models are a step toward “thinking machines” and artificial general intelligence (AGI).

The second fallacy, called “bad at thought -> bad at language,” suggests that if a language model can’t fully capture the richness and sophistication of human thought, then it is not a good model of human language.

This line of thinking is marked by constant criticism of language models for their poor commonsense reasoning abilities and lack of consistent, generalizable, world knowledge.

“These two fallacies really stem from the same misconception: equating language and thought,” Anna Ivanova, co-author of the paper and postdoctoral researcher at MIT, told TechTalks. “One reason why this is a natural mistake to make is that, in real life, we do not have access to another person’s thoughts. If we want to know how good someone is as a thinker, often the best we can do is ask them a question and listen to their response.”

The paper suggests that we can avoid these fallacies if we distinguish between formal and functional linguistic competence.

LLMs and formal linguistic competence

Letter blocks

Formal linguistics encompasses the capacities required to produce and comprehend a given language. It includes linguistic rules as well as statistical regularities that can’t be captured through rules.

The transformer architecture, which is the basis for today’s LLMs, has proven to be a very good tool for modeling formal linguistic competence. The transformer uses multiple layers of neurons, attention mechanisms, and parallel processing to perform very accurate “next word” prediction.

Given enough training data, a large transformer model can generate long sequences of text with consistent linguistic features. For example, LLMs can perform long-distance number agreement above chance (though they still fall short of human performance when there are nested sentences). They are also good at handling subject-verb agreement, wh- questions, and other aspects of linguistics that past approaches have usually failed at.

“Although it is tempting to move the goalposts and focus on what these models are still unable to do… we argue that the remarkable advances in LLMs’ ability to capture various linguistic phenomena should not be overlooked,” the researchers write.

“It seemed to us like it’s still not fully appreciated how impressive LLMs are at what we call ‘formal linguistic competence,’” Mahowald said. “They can produce remarkably fluent language that gets a lot of really complex linguistic structure right. This is not nothing!”

At the same time, they highlight the limits of formal linguistics in large language models. For example, LLMs can achieve good performance on benchmarks without learning the relevant linguistic information, such as hierarchical structure and abstract grammatical categories. In other words, “the models can be ‘right for the wrong reason’ and leverage certain features in the input that aren’t the ones being tested,” the researchers write.

Large language models also require unrealistic amounts of data to achieve near-human performance. The researchers note that human language learners “likely rely on pre-existing biases in order to learn quickly from sparse and noisy input—biases that today’s state-of-the-art models lack.” An interesting direction of research would be inductive biases that can help LLMs learn faster and with less data, and architectures that can capture these biases.

LLMs and functional linguistic competence

puzzle reasoning

Functional linguistics is about using language to do things in the world. We use language to send and receive information about our perceptual and cognitive systems, such as our senses and memory. These capacities are distinct from formal linguistic competence. And we use language to perform social skills and solve real-world problems. As the scientists write in their paper, “A formal language system in isolation is useless to a language user unless it can interface with the rest of perception, cognition, and action.”

The large text corpora used to train LLMs contain a lot of non-linguistic information. This is why language models show impressive performance on certain benchmarks that evaluate logic and reasoning capabilities. Basically, if a scenario is common enough, the LLM can succeed. But if pushed a bit on tasks that require careful reasoning and planning, language models start to break.

Why is this important? “Real-life language use is impossible without non-linguistic cognitive skills. Understanding a sentence, reasoning about its implications, and deciding what to say—these skills all rely on cognitive capacities that go way beyond lexical semantics or syntax,” the researchers warn.

The paper discusses four key areas of functional linguistics, including formal reasoning, world knowledge, situation modeling, and social reasoning. In all cases, LLMs show a degree of superficial ability that can be captured through learning statistical regularities. But they lack the underlying knowledge that would enable them to perform consistently.

“Models that master many syntactic and distributional properties of human language still cannot use language in human-like ways,” the researchers write. “In other words, their functional language competence remains in its infancy.”

“We had the sense that some people took this linguistic success to mean that LLMs were knocking on the door of Artificial General Intelligence,” Mahowald said. “Drawing on cognitive science and cognitive neuroscience, we argue that there is more to human cognition than just fluent language—even while acknowledging that fluent language is a part of the package.”

“People who pay attention to LLM successes at mastering language rules and patterns jump to concluding that these models are learning to think,” Ivanova said. “People who pay attention to LLM’s failures in language use dismiss them altogether, ignoring their successes at learning rules and patterns.”

Separating language and thought


The researchers argue that evidence from cognitive science and neuroscience shows that language and thought in humans are robustly dissociable.

For example, people who lose their linguistic abilities still maintain their cognitive skills, such as playing chess, composing music, and solving arithmetic problems. And MRI scans show that the brain’s language network is very active when people listen to, read, or generate sentences, but not when they are performing arithmetic, logical reasoning, writing programs, etc.

“The machinery dedicated to processing language is separate from the machinery responsible for memory, reasoning, and social skills,” the researchers write.

Accordingly, the authors suggest, if we approach LLM research based on the separation of formal and functional linguistics, we can address some of the challenges that the field faces today. They provide several recommendations to guide the future of LLM research.

One solution is to introduce modularity to separate core language from cognitive skills. Modularity can be achieved through an architecture that combines components designed for different functions. It can also be emergent, where the underlying transformer model is designed in a way that allows individual, specialized modules to self-develop during training. “Whether built-in or induced to emerge, modularity can lead the models to mirror the functional organization of the human brain and, consequently, make their behavior much more humanlike,” the researchers write.

A second solution is to move beyond training LLMs on large corpora of general text curated from the web. The authors propose developing specialized datasets for different tasks, using modular architectures that reflect the cognitive capabilities of humans, and training the models on diverse objective functions. One promising direction is reinforcement learning from human feedback (RLHF), the technique used to train ChatGPT.

Finally, the authors discuss the need for different benchmarks that evaluate formal and functional linguistic competence.

“Clearly identifying different capacities required for language use is step 1; building benchmarks and targeted tests to evaluate these capacities is step 2; improving the models in areas which tests identify as problematic is step 3,” Ivanova said.

Mahowald also expressed concern over decreasing transparency in the field. As startups and big tech compete over getting a larger share of the market for LLMs, they are becoming increasingly reluctant to make their research available to others.

“What I do care about is understanding human language and cognition, and I think LLMs are a really exciting tool for that insofar as they let us analyze a system with really interesting capabilities,” he said. “And so I hope that there is still a place for scientific inquiry using LLMs. If they become increasingly closed off and privatized (as we are seeing), that kind of open inquiry might not be possible. So I’m excited and hopeful about open source efforts to build LLMs for academic research.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.