ChatGPT: It can tell but does not know

openai chatgpt
Image source: 123RF (with modifications)

Polanyi’s paradox, named in honor of the philosopher and polymath Michael Polanyi, states that “we know more than we can tell.”[1] He means that most of our knowledge is tacit and cannot be easily formalized with words.[2] In The Tacit Dimension, Polanyi gives the example of recognizing a face without being able to tell what facial features humans use to make such a distinction. The example describes Gestalt psychology which emerged in the early twentieth century as a theory of perception that rejected the basic principles of elementalist and structuralist psychology as well as functionalist and behavioralist theories of the mind. Gestalt theory emphasizes that conscious humans perceive entire patterns or configurations, not individual components, and cannot always explain what they know. Consider the ancient Chinese game Go, where nobody can define a good move. As Michael Redmond once explained, “I’ll see a move and be sure it’s the right one, but won’t be able to tell you exactly how I know. I just see it.” 

In my bookDoing AI: A Business-Centric Examination of AI Culture, Goals, and Values, I consider the impact of Polanyi’s paradox on problem-solving. I explain that knowledge of complex problems is often minuscule and getting (or giving) help is challenging. If someone wants your support but cannot tell you how exactly to help, they are working on a complex problem. I vividly recall a “top talkers” response from cybersecurity analysts when asked for support to design threat detection algorithms. Despite the importance of implicit knowledge for threat hunting, “top talkers” is easy to codify, so “top talkers” is what I was told. This is not to poke fun at security professionals. In fact, we should not confuse “we know more than we can tell” with “humans who cannot tell, do not know.” Polyani is careful to say, “we know more than we can tell,” not that we don’t know things we cannot tell.

Of course, being taciturn produces some tricky effects on problem-solving since problem specification requires the ability to describe a problem. You can’t define something you don’t know, nor can you readily learn something from someone who can’t tell.

When we need the most help, we are often in the weakest position to ask for it. Plato pointed out this contradiction in Meno; either you know what you are looking for, meaning you don’t have to look, or you don’t and cannot expect to find it. This is one of the essential explanations behind the adoption of machine learning. Machine learning relaxes problem-solving requirements to explain in painstaking detail the problem by labeling training data, feature engineering, or designing models. There may be no other development like machine learning that promises such a positive impact on problem-solving. 

Yet, we oversteer and purge explicit knowledge in favor of learning everything from data. However, only some problems need machine learning, and only some things need to be learned from data. Recently, I’ve seen examples of reinforcement learning used to solve the Rubik’s Cube and convolutional neural networks to solve Sudoku. These efforts may check an “AI” box, but they are peculiar because we know how to solve these games with explicit rules. These examples convert knowledge into data. Even chess, where machines are better than humans, isn’t a game humans need help understanding or solving. As a general rule of thumb, problem-solving becomes more complicated as the tacit knowledge increases and generally gets easier after they have already been solved. 

The data-information-knowledge-wisdom (DIKW) pyramid shows the logical progression of data, information, knowledge, and wisdom, generally suggesting (albeit simplistically) how essential components like information need data, knowledge needs information, and wisdom needs knowledge. Each category provides more value than the last because our understanding becomes more explicit. You may have even used DIKW to advocate for machine learning and enriching data to support higher-quality human understanding. I have. Yet, the examples above (e.g., Rubik’s Cube and Sudoku) convert explicit knowledge to data so that machine learning can produce implicit knowledge.[3] 

data-information-knowledge-wisdom

Ultimately, knowing something about the problem is okay. If you can describe a problem, then heuristics, feature engineering, simple models, rules, and knowledge bases are ways problem-solvers put aspects of a problem they understand into a solution so others will also understand the solution. If you know it but can’t tell it, you may get lucky and be able to acquire data and use machine learning to say it for you. Exceptional technical leadership begins by knowing the difference, not getting machine learning to tell something you can already say.

This is an important lesson that ChatGPT learned that its predecessors, Microsoft Tay and Meta’s Galactica, did not. As Gary Marcus explains in an article humorously subtitled Nightmare on LLM Street: “ChatGPT has guardrails, and those guardrails, most of the time, keep ChatGPT from erupting the way Galactica [or Microsoft Tay] did.” These guardrails are explicit rules that govern ChatGPT’s behavior (or, more aptly, misbehavior). The process of formalizing implicit or explicit knowledge at the exclusion of the other is self-defeating.[4] Nevertheless, the results from machine learning do not expressly (or even vaguely) contain explicit knowledge. ChatGPT has no knowledge or language understanding. The explicit knowledge is only at the edges to prevent the chatbot from being completely misguided.

Polyani is careful to say, “we know more than we can tell,” not that we don’t know because we cannot tell.[5] However, ChatGPT is Polanyi’s paradox in reverse. It can tell, but it doesn’t know. As the model responded to me in one chat, “I will always do my best to provide a helpful and accurate response,” followed by, “I don’t have the ability to know what I don’t know.” This bit of Rumsfeldian wisdom is a reality for everyone but ChatGPT, which doesn’t even know what it does know. The consequence for users is Meno’s paradox: for questions where a user knows the answer, ChatGPT is [at best] trivial, and for questions they don’t, users will be unable to recognize if it is correct, and neither will ChatGPT. The mysterious origins of knowledge make ChatGPT’s value fickle. It is hard to believe that when we need the most help, we can get it from something that doesn’t know. 

[1] When paired with cognitive biases like the Dunning-Kruger effect, Polanyi’s paradox for some is we know less than we think we can tell and often think we know more than we can tell. 

[2] The word tacit comes from the Latin tacitus, which means achieved without words. The adjective taciturn refers to an uncommunicative person.

[3] Many attribute the idea of the DIKW to T.S. Eliot in the pageant play The Rock, specifically, two lines in the poem “Choruses”:

Where is the wisdom we have lost in knowledge? 

Where is the knowledge we have lost in information?

[4] Still, friends of artificial intelligence like Richard Sutton and Geoffrey Hinton advocate against this hybrid approach with declarations such as the Bitter Lesson and Intelligent Design, which Gary Marcus [generally] opposes.

[5] I recall a defense of black-box machine learning being that humans are black boxes too. This is false (logically speaking) and an example of tu quoque, an ad hominem fallacy which means “you too,” a response to allegations by saying, in essence, “you do the same thing” by ignoring the alleged flaws. However, it is false by Polyani’s standards. He emphasized that we know more than we can tell, not that we don’t know because we cannot tell.

2 COMMENTS

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.