Comparing Rationality Between Large Language Models and Humans: Insights and Open Questions

Dana Alsagheer,Rabimba Karanjai,Nour Diallo,Weidong Shi,Yang Lu,Suha Beydoun,Qiaoning Zhang
2024-03-15
Abstract:This paper delves into the dynamic landscape of artificial intelligence, specifically focusing on the burgeoning prominence of large language models (LLMs). We underscore the pivotal role of Reinforcement Learning from Human Feedback (RLHF) in augmenting LLMs' rationality and decision-making prowess. By meticulously examining the intricate relationship between human interaction and LLM behavior, we explore questions surrounding rationality and performance disparities between humans and LLMs, with particular attention to the Chat Generative Pre-trained Transformer. Our research employs comprehensive comparative analysis and delves into the inherent challenges of irrationality in LLMs, offering valuable insights and actionable strategies for enhancing their rationality. These findings hold significant implications for the widespread adoption of LLMs across diverse domains and applications, underscoring their potential to catalyze advancements in artificial intelligence.
Computers and Society
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Comparing the rational differences between large - scale language models (LLMs) and humans**: Through a series of experiments and evaluation methods, the paper compares the performance of LLMs (especially models represented by ChatGPT) and humans in different tasks, aiming to explore the similarities and differences between the two in the decision - making process and problem - solving ability. Specifically, the researchers designed multiple test tasks, such as the Wason Selection Task, the Conjunction Fallacy Test, and the Stereotype Base Rate Neglect, to evaluate the performance of LLMs and humans in these tasks. 2. **Exploring the irrational phenomena in LLMs and their causes**: The research found that although LLMs perform well in some tasks, they have obvious deficiencies in handling logical reasoning and avoiding cognitive biases. For example, ChatGPT can hardly correctly apply logical rules in the Wason Selection Task, and also performs poorly in the Conjunction Fallacy Test. These findings reveal the limitations of LLMs in understanding and applying complex logical reasoning. 3. **Proposing strategies to improve the rationality of LLMs**: Based on the above findings, the paper proposes several strategies to improve the rationality of LLMs, including optimizing the model through a more refined human - feedback mechanism, and enhancing the transparency and auditing mechanism of the model. The researchers emphasize that through these methods, the decision - making process of LLMs can be better understood, and measures can be taken to reduce their irrational behaviors. 4. **Discussing the impact of human feedback on LLMs training**: The paper also explores the biases and irrational factors that may be introduced by human feedback when training LLMs. The researchers point out that although human feedback helps the model to be closer to the human cognitive model, it may also cause the model to inherit human irrational behaviors. Therefore, how to design an effective feedback mechanism to ensure that the model can learn from high - quality feedback is an important research direction. In summary, through systematic research and experiments, this paper not only reveals the current situation and challenges of LLMs in rational decision - making, but also provides valuable references and suggestions for future research.