Abstract:Game theory, as an analytical tool, is frequently utilized to analyze human behavior in social science research. With the high alignment between the behavior of Large Language Models (LLMs) and humans, a promising research direction is to employ LLMs as substitutes for humans in game experiments, enabling social science research. However, despite numerous empirical researches on the combination of LLMs and game theory, the capability boundaries of LLMs in game theory remain unclear. In this research, we endeavor to systematically analyze LLMs in the context of game theory. Specifically, rationality, as the fundamental principle of game theory, serves as the metric for evaluating players' behavior -- building a clear desire, refining belief about uncertainty, and taking optimal actions. Accordingly, we select three classical games (dictator game, Rock-Paper-Scissors, and ring-network game) to analyze to what extent LLMs can achieve rationality in these three aspects. The experimental results indicate that even the current state-of-the-art LLM (GPT-4) exhibits substantial disparities compared to humans in game theory. For instance, LLMs struggle to build desires based on uncommon preferences, fail to refine belief from many simple patterns, and may overlook or modify refined belief when taking actions. Therefore, we consider that introducing LLMs into game experiments in the field of social science should be approached with greater caution.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper aims to systematically analyze the capability boundaries of large language models (LLMs) in game theory. Specifically, the paper evaluates whether LLMs can achieve the three characteristics of rational players through three classic games (Dictator Game, Rock-Paper-Scissors Game, Ring Network Game): 1. **Constructing Clear Desires**: Establishing specific opinions on each outcome in the game based on preferences. 2. **Refining Beliefs about Uncertainty**: Extracting the probability distribution of opponents' behaviors from game information. 3. **Taking Optimal Actions**: Choosing the best action based on desires and beliefs. ### Research Background Game theory, as a mathematical tool for analyzing human behavior, is widely used in social sciences (such as economics, psychology, sociology, etc.). With the development of large language models, their high consistency with human behavior has led researchers to consider using LLMs as substitutes for humans in social science research. However, despite many empirical studies combining LLMs and game theory, the capability boundaries of LLMs in game theory remain unclear. ### Research Methods 1. **Dictator Game**: Used to evaluate whether LLMs can construct clear desires based on different preferences. Experimental results show that LLMs perform well under common preferences but poorly under uncommon preferences. 2. **Rock-Paper-Scissors Game**: Used to evaluate whether LLMs can refine beliefs from simple patterns. Experimental results show that even the most advanced GPT-4 finds it difficult to refine beliefs from many simple patterns. 3. **Ring Network Game**: Used to evaluate whether LLMs can take optimal actions given certain beliefs. Experimental results show that LLMs can improve their ability to take optimal actions in some cases but still tend to ignore or modify already refined beliefs. ### Main Findings 1. **Constructing Clear Desires**: LLMs can construct clear desires under common preferences but perform poorly under uncommon preferences. 2. **Refining Beliefs about Uncertainty**: LLMs find it difficult to refine beliefs from many simple patterns, especially in game experiments that require handling complex beliefs. 3. **Taking Optimal Actions**: LLMs can improve their ability to take optimal actions in some cases but still tend to ignore or modify already refined beliefs. ### Conclusion The paper systematically explores the capability boundaries of LLMs in game theory and points out that caution should be exercised when introducing LLMs into social science research. Although GPT-4 performs well in some aspects, overall, LLMs still have significant shortcomings in handling complex beliefs and taking optimal actions.

Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

Large Language Models Assume People are More Rational than We Really are

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

Playing Games With GPT: What Can We Learn About a Large Language Model From Canonical Strategic Games?

Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?

Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

The Emergence of Strategic Reasoning of Large Language Models

Large Language Models Playing Mixed Strategy Nash Equilibrium Games

Economics Arena for Large Language Models

Comparing Rationality Between Large Language Models and Humans: Insights and Open Questions

Can Large Language Model Agents Simulate Human Trust Behavior?

Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs

Strategic behavior of large language models and the role of game structure versus contextual framing

Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models

Game-theoretic LLM: Agent Workflow for Negotiation Games

Large language models as linguistic simulators and cognitive models in human research

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina

The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

Do Large Language Models Learn Human-Like Strategic Preferences?

GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations