Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

Xuhui Zhou,Zhe Su,Tiwalayo Eisape,Hyunwoo Kim,Maarten Sap
2024-10-04
Abstract:Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore the challenges faced by large language models (LLMs) in simulating social interactions, particularly the issue of information asymmetry. Specifically, the paper attempts to address the following key issues: 1. **Impact of Information Asymmetry**: - Many existing studies use an omniscient perspective (i.e., a single LLM generating the behavior of all interlocutors) to simulate social interactions, which fundamentally differs from the non-omniscient, information-asymmetric interactions between humans and AI agents in the real world. - The authors developed an evaluation framework to compare social interaction simulations from an omniscient perspective (SCRIPT mode) and a non-omniscient perspective (AGENTS mode) to assess the impact of information asymmetry on LLM performance. 2. **Comparison of Simulation Effects**: - Under the omniscient perspective, LLMs are better able to achieve social goals and generate more natural dialogues. - Under the non-omniscient perspective, LLMs perform worse in achieving social goals and generating natural dialogues, especially in the presence of information asymmetry. 3. **Learning Effects from Omniscient Perspective Simulations**: - The authors explored whether fine-tuning LLMs on data generated from an omniscient perspective could improve their performance under a non-omniscient perspective. - The results show that while fine-tuning can enhance the natural dialogue capabilities of LLMs under a non-omniscient perspective, it has limited effect on their ability to achieve social goals, particularly in tasks requiring cooperation. 4. **Impact of Data Bias**: - The authors analyzed data biases in omniscient perspective simulations and found that these biases limit the model's ability to generalize social skills to real-world scenarios. ### Main Findings - **Information Asymmetry is a Key Challenge**: Information asymmetry significantly affects LLM performance under a non-omniscient perspective, making it difficult to effectively achieve social goals and generate natural dialogues. - **Limitations of the Omniscient Perspective**: While simulations from an omniscient perspective perform better on certain metrics, they do not accurately reflect real-world social interactions. - **Limitations of Fine-Tuning**: Fine-tuning LLMs on data generated from an omniscient perspective can improve their natural dialogue capabilities but has limited effectiveness in completing complex social tasks. - **Impact of Data Bias**: Data biases in omniscient perspective simulations limit the model's applicability in real-world scenarios. ### Recommendations Based on the above findings, the authors suggest that when reporting on LLM-driven agent work, more caution and transparency should be exercised in the use of data and learning methods to ensure the authenticity and reliability of simulation results.