Abstract:Large Language Models (LLMs) exhibit potentially harmful biases that reinforce culturally inherent stereotypes, cloud moral judgments, or amplify positive evaluations of majority groups. Previous explanations mainly attributed bias in LLMs to human annotators and the selection of training data. Consequently, they have typically been addressed with bottom-up approaches such as reinforcement learning or debiasing corpora. However, these methods only treat the effects of LLM biases by indirectly influencing the model architecture, but do not address the underlying causes in the computational process. Here, we propose metacognitive myopia as a cognitive-ecological framework that can account for a conglomerate of established and emerging LLM biases and provide a lever to address problems in powerful but vulnerable tools. Our theoretical framework posits that a lack of the two components of metacognition, monitoring and control, causes five symptoms of metacognitive myopia in LLMs: integration of invalid tokens and embeddings, susceptibility to redundant information, neglect of base rates in conditional computation, decision rules based on frequency, and inappropriate higher-order statistical inference for nested data structures. As a result, LLMs produce erroneous output that reaches into the daily high-stakes decisions of humans. By introducing metacognitive regulatory processes into LLMs, engineers and scientists can develop precise remedies for the underlying causes of these biases. Our theory sheds new light on flawed human-machine interactions and raises ethical concerns regarding the increasing, imprudent implementation of LLMs in organizational structures.

What problem does this paper attempt to address?

The paper primarily explores the issue of potential harmful biases present in large language models (LLMs). The authors point out that existing explanations mostly attribute these biases to the choices of human annotators and the composition of training datasets. The commonly adopted methods (such as reinforcement learning or debiasing datasets) only indirectly affect the model architecture and fail to address the problem fundamentally. The paper proposes a new theoretical framework—metacognitive myopia—to explain various known and emerging bias phenomena in LLMs and provides a theoretical basis for addressing these issues. The metacognitive myopia framework posits that LLMs lack two key components of metacognition: monitoring (assessing the effectiveness of model behavior) and control (adjusting model behavior to correct misinformation). This lack leads to five symptoms: 1. **Integration of invalid information**: LLMs tend to accept information from any source as truth, including those that are clearly unreliable. 2. **Sensitivity to redundant information**: When the same information appears multiple times in the training dataset, LLMs overemphasize this information. 3. **Ignoring base rates in conditional computation**: LLMs ignore the impact of background frequencies when performing conditional probability calculations. 4. **Frequency-based decision rules**: LLMs tend to make decisions based on the frequency of occurrence rather than considering the quality or relevance of the information. 5. **Inappropriate higher-order statistical inference**: LLMs may make inappropriate statistical inferences when dealing with nested data structures. By introducing metacognitive regulatory processes into LLMs, engineers and scientists can develop precise solutions targeting the root causes of these biases. This theoretical framework not only helps in understanding the flaws in human-machine interaction but also raises ethical concerns about the increasingly prevalent but imprudently implemented use of LLMs in organizational structures.

Metacognitive Myopia in Large Language Models

Cognitive Bias in Decision-Making with LLMs

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments

Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs

CBEval: A framework for evaluating and interpreting cognitive biases in LLMs

Large Language Models Engineer Too Many Simple Features For Tabular Data

Anthropocentric bias and the possibility of artificial cognition

The Life Cycle of Large Language Models: A Review of Biases in Education

Uncovering Biases with Reflective Large Language Models

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?

"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models

Evaluation and mitigation of cognitive biases in medical language models

Belief in the Machine: Investigating Epistemological Blind Spots of Language Models

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

A Comprehensive Evaluation of Cognitive Biases in LLMs

Gender bias and stereotypes in Large Language Models

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions