Abstract:When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Several empirical studies have investigated the rationality and social behavior performance of LLMs, yet their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of LLMs. Through a multiple-choice-list experiment, we estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities. However, there are significant variations in the degree to which these behaviors are expressed across different LLMs. We also explore their behavior when embedded with socio-demographic features, uncovering significant disparities. For instance, when modeled with attributes of sexual minority groups or physical disabilities, Claude-3-Opus displays increased risk aversion, leading to more conservative choices. These findings underscore the need for careful consideration of the ethical implications and potential biases in deploying LLMs in decision-making scenarios. Therefore, this study advocates for developing standards and guidelines to ensure that LLMs operate within ethical boundaries while enhancing their utility in complex decision-making environments.

Rationality Report Cards: Assessing the Economic Rationality of Large Language Models

STEER: Assessing the Economic Rationality of Large Language Models

(Ir)rationality and cognitive biases in large language models

Large Language Model As Autonomous Decision Maker

Rational Decision-Making Agent with Internalized Utility Judgment

EconNLI: Evaluating Large Language Models on Economics Reasoning

Economics Arena for Large Language Models

Comparing Rationality Between Large Language Models and Humans: Insights and Open Questions

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

Large Language Models Assume People are More Rational than We Really are

The Moral Mind(s) of Large Language Models

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

Large Language Models: An Applied Econometric Framework

Evaluating Large Language Models on Financial Report Summarization: An Empirical Study

Towards Rationality in Language and Multimodal Agents: A Survey

From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls

LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory

LLM-driven Imitation of Subrational Behavior : Illusion or Reality?

The Emergence of Economic Rationality of GPT

Large Legislative Models: Towards Efficient AI Policymaking in Economic Simulations