Metacognitive Myopia in Large Language Models

Florian Scholten,Tobias R. Rebholz,Mandy Hütter
2024-08-10
Abstract:Large Language Models (LLMs) exhibit potentially harmful biases that reinforce culturally inherent stereotypes, cloud moral judgments, or amplify positive evaluations of majority groups. Previous explanations mainly attributed bias in LLMs to human annotators and the selection of training data. Consequently, they have typically been addressed with bottom-up approaches such as reinforcement learning or debiasing corpora. However, these methods only treat the effects of LLM biases by indirectly influencing the model architecture, but do not address the underlying causes in the computational process. Here, we propose metacognitive myopia as a cognitive-ecological framework that can account for a conglomerate of established and emerging LLM biases and provide a lever to address problems in powerful but vulnerable tools. Our theoretical framework posits that a lack of the two components of metacognition, monitoring and control, causes five symptoms of metacognitive myopia in LLMs: integration of invalid tokens and embeddings, susceptibility to redundant information, neglect of base rates in conditional computation, decision rules based on frequency, and inappropriate higher-order statistical inference for nested data structures. As a result, LLMs produce erroneous output that reaches into the daily high-stakes decisions of humans. By introducing metacognitive regulatory processes into LLMs, engineers and scientists can develop precise remedies for the underlying causes of these biases. Our theory sheds new light on flawed human-machine interactions and raises ethical concerns regarding the increasing, imprudent implementation of LLMs in organizational structures.
Artificial Intelligence,Computation and Language,Computers and Society,Applications
What problem does this paper attempt to address?
The paper primarily explores the issue of potential harmful biases present in large language models (LLMs). The authors point out that existing explanations mostly attribute these biases to the choices of human annotators and the composition of training datasets. The commonly adopted methods (such as reinforcement learning or debiasing datasets) only indirectly affect the model architecture and fail to address the problem fundamentally. The paper proposes a new theoretical framework—metacognitive myopia—to explain various known and emerging bias phenomena in LLMs and provides a theoretical basis for addressing these issues. The metacognitive myopia framework posits that LLMs lack two key components of metacognition: monitoring (assessing the effectiveness of model behavior) and control (adjusting model behavior to correct misinformation). This lack leads to five symptoms: 1. **Integration of invalid information**: LLMs tend to accept information from any source as truth, including those that are clearly unreliable. 2. **Sensitivity to redundant information**: When the same information appears multiple times in the training dataset, LLMs overemphasize this information. 3. **Ignoring base rates in conditional computation**: LLMs ignore the impact of background frequencies when performing conditional probability calculations. 4. **Frequency-based decision rules**: LLMs tend to make decisions based on the frequency of occurrence rather than considering the quality or relevance of the information. 5. **Inappropriate higher-order statistical inference**: LLMs may make inappropriate statistical inferences when dealing with nested data structures. By introducing metacognitive regulatory processes into LLMs, engineers and scientists can develop precise solutions targeting the root causes of these biases. This theoretical framework not only helps in understanding the flaws in human-machine interaction but also raises ethical concerns about the increasingly prevalent but imprudently implemented use of LLMs in organizational structures.