STEER: Assessing the Economic Rationality of Large Language Models

Narun Raman,Taylor Lundy,Samuel Amouyal,Yoav Levine,Kevin Leyton-Brown,Moshe Tennenholtz

2024-05-29

Abstract:There is increasing interest in using LLMs as decision-making "agents." Doing so includes many degrees of freedom: which model should be used; how should it be prompted; should it be asked to introspect, conduct chain-of-thought reasoning, etc? Settling these questions -- and more broadly, determining whether an LLM agent is reliable enough to be trusted -- requires a methodology for assessing such an agent's economic rationality. In this paper, we provide one. We begin by surveying the economic literature on rational decision making, taxonomizing a large set of fine-grained "elements" that an agent should exhibit, along with dependencies between them. We then propose a benchmark distribution that quantitatively scores an LLMs performance on these elements and, combined with a user-provided rubric, produces a "STEER report card." Finally, we describe the results of a large-scale empirical experiment with 14 different LLMs, characterizing the both current state of the art and the impact of different model sizes on models' ability to exhibit rational behavior.

Computation and Language,General Economics

What problem does this paper attempt to address?

The paper attempts to address the issue of evaluating the economic rationality of large language models (LLMs) when used as decision agents. Specifically, the researchers face the following key questions: 1. **Choosing the right model**: How to select the most suitable LLM to perform specific decision tasks? 2. **Designing effective prompts**: How to optimize the performance of LLMs through prompting, such as whether chain-of-thought reasoning is needed? 3. **Evaluating economic rationality**: How to systematically assess the performance of an LLM in various economic decision tasks to ensure its behavior is reliable and trustworthy? To answer these questions, the paper proposes a benchmark framework named STEER (Systematic and Tuneable Evaluation of Economic Rationality). The main contributions of the STEER framework include: - **Classification of elements of economic rationality**: The paper first provides a detailed classification of economic rationality, defining 64 specific "rationality elements" that cover various aspects from basic mathematical abilities to decision-making in complex multi-agent environments. - **Generation and validation of test questions**: Based on the above classification, the researchers generated a large number of multiple-choice questions and ensured the quality and accuracy of these questions through manual validation. - **STEER report card**: Using the STEER framework, detailed report cards can be generated to evaluate the performance of different LLMs in various decision tasks, including the impact of factors such as model size, self-explanation, and few-shot prompting on performance. Through this systematic evaluation method, the researchers hope to provide a reliable assessment standard for the application of LLMs in the field of economic decision-making, thereby promoting further development in this area.

STEER: Assessing the Economic Rationality of Large Language Models

Rationality Report Cards: Assessing the Economic Rationality of Large Language Models

Large Language Model As Autonomous Decision Maker

Economics Arena for Large Language Models

Large Language Models: An Applied Econometric Framework

EconNLI: Evaluating Large Language Models on Economics Reasoning

Rational Decision-Making Agent with Internalized Utility Judgment

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

The Economic Implications of Large Language Model Selection on Earnings and Return on Investment: A Decision Theoretic Model

Evaluating Large Language Models on Financial Report Summarization: An Empirical Study

Large Legislative Models: Towards Efficient AI Policymaking in Economic Simulations

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Exploring and steering the moral compass of Large Language Models

Interacting Large Language Model Agents. Interpretable Models and Social Learning

The Moral Mind(s) of Large Language Models

Large Language Models Assume People are More Rational than We Really are

(Ir)rationality and cognitive biases in large language models

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

An Experimental Study of Competitive Market Behavior Through LLMs