Abstract:Language models are known to absorb biases from their training data, leading to predictions driven by statistical regularities rather than semantic relevance. We investigate the impact of these biases on answer choice preferences in the Massive Multi-Task Language Understanding (MMLU) task. Our findings reveal that differences in learned regularities across answer options are predictive of model preferences and mirror human test-taking strategies. To address this issue, we introduce two novel methods: Counterfactual Prompting with Chain of Thought (CoT) and Counterfactual Prompting with Agnostically Primed CoT (APriCoT). We demonstrate that while Counterfactual Prompting with CoT alone is insufficient to mitigate bias, our novel Primed Counterfactual Prompting with CoT approach effectively reduces the influence of base-rate probabilities while improving overall accuracy. Our results suggest that mitigating bias requires a "System-2" like process and that CoT reasoning is susceptible to confirmation bias under some prompting methodologies. Our contributions offer practical solutions for developing more robust and fair language models.

What problem does this paper attempt to address?

This paper attempts to solve the Base - Rate Probability (BRP) problem in large - language models (LLMs). Specifically, the author focuses on how to reduce the bias in these models' answer - selection preferences, especially in the Massive Multi - Task Language Understanding (MMLU) task. The following are the main problems and solutions in the paper: ### Main Problems 1. **The Influence of the Base - Rate Probability Effect**: LLMs absorb biases from the training data, causing their predictions to rely more on statistical regularities rather than semantic relevance. This bias can affect the model's preference for answer options. 2. **The Limitations of Chain of Thought (CoT)**: Although the traditional Chain of Thought method can promote more complex reasoning, in some cases it can exacerbate the base - rate probability effect, similar to confirmation bias. ### Solutions To address the above problems, the author proposes two new methods: 1. **Combining Counterfactual Prompting with Chain of Thought (CF + CoT)**: This method attempts to mitigate the base - rate probability effect by introducing counterfactual prompts, but the experimental results show that it actually amplifies this effect. 2. **Combining Counterfactual Prompting with Agnostically Primed Chain of Thought (APriCoT)**: This is an improved method. By independently evaluating each answer option and requiring the model to provide coherent comments, it reduces bias and improves accuracy. ### Experimental Results - **CF + CoT**: Experiments show that CF + CoT not only fails to effectively mitigate the base - rate probability effect, but also makes it more significant, causing the model's selection distribution to deviate from the true answer distribution. - **APriCoT**: In contrast, APriCoT effectively reduces the influence of the base - rate probability effect, making the model's selection distribution closer to the true answer distribution and significantly improving the model's accuracy. ### Conclusion By introducing the APriCoT method, the author successfully alleviates the influence of the base - rate probability effect on the behavior of LLMs and improves the performance of the model in the MMLU task. This provides a practical solution for developing more robust and fair language models. ### Formula Representation The base - rate probability effect involved in the paper can be represented by the following formula: \[ P(\text{Choice}|\text{Context})=\frac{P(\text{Context}|\text{Choice})\cdot P(\text{Choice})}{P(\text{Context})} \] where \( P(\text{Choice}) \) represents the base - rate probability (BRP), \( P(\text{Context}|\text{Choice}) \) represents the probability of the context given the answer option, and \( P(\text{Context}) \) is the total probability of the context. In this way, the paper shows how to improve the prompt strategy to mitigate the base - rate probability effect in LLMs, thereby improving the accuracy and fairness of the model.

Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning

Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Pattern-Aware Chain-of-Thought Prompting in Large Language Models

Chain-of-Thought Reasoning Without Prompting

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Think Beyond Size: Adaptive Prompting for More Effective Reasoning

Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning

A comparison of chain-of-thought reasoning strategies across datasets and models

Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning

Stress Testing Chain-of-Thought Prompting for Large Language Models

An automatically discovered chain-of-thought prompt generalizes to novel models and datasets

Cognitive Bias in Decision-Making with LLMs

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Chain-of-Thought Augmentation with Logit Contrast for Enhanced Reasoning in Language Models

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods