Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models

Courtland Leer,Vincent Trost,Vineeth Voruganti
DOI: https://doi.org/10.48550/arXiv.2310.06983
2023-10-11
Abstract:Recent research shows that Large Language Models (LLMs) exhibit a compelling level of proficiency in Theory of Mind (ToM) tasks. This ability to impute unobservable mental states to others is vital to human social cognition and may prove equally important in principal-agent relations between individual humans and Artificial Intelligences (AIs). In this paper, we explore how a mechanism studied in developmental psychology known as Violation of Expectation (VoE) can be implemented to reduce errors in LLM prediction about users by leveraging emergent ToM affordances. And we introduce a \textit{metacognitive prompting} framework to apply VoE in the context of an AI tutor. By storing and retrieving facts derived in cases where LLM expectation about the user was violated, we find that LLMs are able to learn about users in ways that echo theories of human learning. Finally, we discuss latent hazards and augmentative opportunities associated with modeling user psychology and propose ways to mitigate risk along with possible directions for future inquiry.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the prediction error problem of large - language models (LLMs) in the Theory of Mind (ToM) tasks. Specifically, the author explores a mechanism, that is, implementing the Violation of Expectation (VoE) mechanism in developmental psychology through metacognitive prompting, in order to reduce the errors of LLMs when predicting user behaviors. By storing and retrieving the facts deduced when the expectations of LLMs for users are violated, the research finds that LLMs can understand users in a way similar to human learning. In addition, the paper also discusses the potential risks and gain opportunities of modeling user psychology, and proposes methods to mitigate risks and directions for future research. The main objectives of the paper are: 1. To demonstrate the general utility of the metacognitive prompting framework in reducing ToM prediction errors in a specific application - Bloom (a free AI tutor). 2. To have an in - depth discussion of opportunities for future work, including the practical and philosophical significance of this emerging ability, and how to use confidential computing environments to protect the security of these mental renderings.