We’re living through an exciting moment in history: the breakthrough of English-fluent AI. There’s a diversity of reactions to this. Some folks are excited about the near-future possibilities — this includes me. Some are worried about the potential negative consequences. Many people criticize the current models, such as GPT variants, for hallucinating.
I suspect that most people outside the AI community are missing an important point that bears repeating: Many of the great features of current LLMs are emergent properties. For example, the base model of GPT was not explicitly trained for accuracy of information. To be fair, a later stage of fine-tuning (a technique called reinforcement learning through human feedback) was used to nudge GPT’s answers in the correct, useful, and less-offensive direction. At the same time, this process can only filter among different options that already exist within the model; it cannot add new ideas to the model.
So it’s worth asking:
What other emergent properties do modern LLMs exhibit?
Today’s post highlights one of the most exciting — strong evidence that GPT models have a meaningful understanding of what people are thinking. In other words, these specific AI models can correctly answer questions that require an understanding of other peoples’ internal thoughts. As a point of comparison, many of these questions are difficult for young humans who are already native English speakers.
This emergent feature is called Theory of Mind.
— Tyler & Team
Paper: Theory of Mind May Have Spontaneously Emerged in Large Language Models
Summary by Adrian Wilkins-Caruana
Theory of Mind (ToM) refers to the ability to understand that other people have their own beliefs, desires, and intentions that may differ from one's own. For example, a child demonstrates ToM by recognizing that their friend may want a different toy than they do.
Even though large language models (LLMs) are great at many tasks, they’ve struggled to demonstrate ToM — until now. Remarkably, recent LLMs such as GPT-3 and 3.5 demonstrate a capacity for ToM at the same level as a 7–9 year old child!
Here's a task that tests whether a person or LLM demonstrates ToM. The subject is first given the following info:
“Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says “chocolate” and not “popcorn.” Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label.”
After reading this statement, the subject is presented with this prompt: “She is delighted that she has found this bag. She loves eating…”. If the subject responds with “chocolate,” then they understand that Sam thinks there is chocolate in the bag, even though the subject knows this isn’t true. That’s ToM, and that was GPT-3.5’s response to the prompt!
Using two types of ToM tests and 20 variations on each (for a total of 40 tests), Michal Kosinski tested 9 different LLMs from the past 5 years. He was careful to use new task variations that hadn’t appeared in the LLMs’ training data, and to design prompts that don’t leak info about the solution. For example, if the prompt was “Sam believes the bag contains…”, the term “believes” might give the LLM a hint that Sam is about to be surprised.
Across all these tasks, only models from last year (each >150B parameters) demonstrated at least the ToM level of 5-year-old kids, while GPT-3.5 was on par with 7–9 year olds. Because these LLMs weren’t trained for the task of ToM and had no context of the ToM task (LLM responses were single-shot results), Kosinski’s result indicates that ToM might have emerged entirely through the LLMs’ language-learning task.
Here are the full results. While older and smaller models struggle with ToM, the latest and greatest LLMs really know how to empathize!