Can LLMs Replace Economic Choice Prediction Labs? The Case of Language-based Persuasion Games

Eilam Shapira,Omer Madmon,Roi Reichart,Moshe Tennenholtz
2024-08-15
Abstract:Human choice prediction in economic contexts is crucial for applications in marketing, finance, public policy, and more. This task, however, is often constrained by the difficulties in acquiring human choice data. With most experimental economics studies focusing on simple choice settings, the AI community has explored whether LLMs can substitute for humans in these predictions and examined more complex experimental economics settings. However, a key question remains: can LLMs generate training data for human choice prediction? We explore this in language-based persuasion games, a complex economic setting involving natural language in strategic interactions. Our experiments show that models trained on LLM-generated data can effectively predict human behavior in these games and even outperform models trained on actual human data.
Machine Learning,Artificial Intelligence,Computation and Language,Computer Science and Game Theory,Human-Computer Interaction
What problem does this paper attempt to address?
The main problem this paper attempts to address is whether large language models (LLMs) can replace human data in generating training sets for economic decision forecasting. Specifically, the authors explore whether data generated by LLMs can be used to train predictive models in language-based persuasion games, and whether these models can effectively predict human behavior, potentially even outperforming models trained with real human data in some cases. ### Background and Motivation In fields such as economics and marketing, accurately predicting human choice behavior is crucial. However, obtaining real human choice data often faces numerous challenges, including privacy issues, legal restrictions, and the cost and time involved in the data collection process. Therefore, researchers have begun to explore whether synthetic data generated by large language models (LLMs) can be used to replace or supplement real data, thereby improving the efficiency and accuracy of predictive models. ### Research Methods 1. **Data Generation**: - **Human Data**: The authors used a dataset collected by Shapira et al. (2023) through a mobile application, which includes interactions between human players and robot experts. - **LLM-Generated Data**: The authors used 5 different LLMs (including Google's Chat-Bison and Gemini-1.5, Alibaba's Qwen-2 72B, Meta's Llama-3 70B and 8B) to generate synthetic data. To increase data diversity, they also employed role diversification techniques by fine-tuning prompts to simulate different types of player behavior. 2. **Predictive Models**: - The authors compared the performance of four predictive models (LSTM, Mamba, Transformer, and XGBoost) on different data sources, including human data, LLM-generated data, and a baseline method based solely on sentiment analysis. ### Key Findings 1. **Effectiveness of LLM-Generated Data**: - When the sample size is sufficiently large, predictive models trained with LLM-generated data outperformed those trained with real human data in human choice prediction tasks. - This indicates that LLM-generated data can not only simulate human behavior but also capture complex interaction patterns, thereby improving prediction accuracy. 2. **Combining LLM and Human Data**: - Combining LLM-generated data with real human data can further enhance the accuracy and robustness of predictive models. - Experimental results show that the performance of mixed datasets surpasses that of models using only human data or only LLM-generated data. ### Conclusion This study demonstrates the significant potential of LLM-generated data in economic decision forecasting, especially in situations where obtaining real human data is challenging. LLM-generated data can serve as an efficient and low-cost alternative. Moreover, combining LLM-generated data with real human data can further improve the performance of predictive models. This finding is of great significance for research in economics and social sciences, providing new directions for future studies.