Abstract:Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence.

What problem does this paper attempt to address?

The problem this paper attempts to address is the evaluation of the effectiveness of large language model (LLM)-generated agents in simulating human group behavior, particularly whether these agents can exhibit phenomena similar to human collective intelligence. Specifically, the researchers focus on the phenomenon of "Wisdom of Partisan Crowds," which refers to the ability of human groups to converge on more accurate beliefs through discussion, even in the presence of political polarization and partisan bias. ### Main Research Questions: 1. **Can LLM agents simulate human partisan bias?** - The researchers examine whether LLM agents, when assigned roles of different political parties (e.g., Democrat or Republican), will exhibit partisan biases similar to those of humans. 2. **Can LLM agents converge to more accurate beliefs through discussion?** - By simulating the discussion process within human groups, the researchers assess whether LLM agents can, like humans, reduce errors and improve the accuracy of their estimates through social interaction. 3. **What factors influence the collective intelligence effect of LLM agents?** - The researchers explore how different factors (such as the level of detail in role backgrounds and the use of chain-of-thought reasoning) affect the collective intelligence effect of LLM agents. ### Research Methods: - **Experimental Design**: The researchers adopted the experimental design of Becker et al. (2019), having LLM agents answer factual questions known to have partisan biases and adjust their estimates through multiple rounds of interaction. - **Role Setting**: LLM agents were assigned different role backgrounds, including simple and detailed background information. - **Chain-of-Thought Reasoning**: The researchers also examined the impact of using chain-of-thought reasoning (CoT) on the performance of LLM agents. - **Fine-Tuning**: Some LLM agents were fine-tuned using human data to enhance their ability to simulate human behavior. ### Main Findings: - **Detailed Role Background and No Chain-of-Thought Reasoning**: LLM agents most closely resembled human group behavior, exhibiting significant partisan bias and collective intelligence effects when using detailed role backgrounds and not employing chain-of-thought reasoning. - **Impact of Chain-of-Thought Reasoning**: The use of chain-of-thought reasoning weakened the collective intelligence effect of LLM agents. - **Impact of Fine-Tuning**: Fine-tuning LLM agents with human data significantly enhanced their ability to simulate human behavior but could lead to overfitting. ### Conclusion: This study demonstrates the potential and limitations of LLM agents in simulating human collective intelligence. With detailed role backgrounds and appropriate design, LLM agents can exhibit collective intelligence effects similar to those of humans, but their performance may be affected in certain scenarios (such as when using chain-of-thought reasoning). These findings provide valuable insights for future research, particularly in evaluating and improving the application of LLMs in social interactions.

The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents

Simulating Opinion Dynamics with Networks of LLM-based Agents

Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy

Systematic Biases in LLM Simulations of Debates

LLM Voting: Human Choices and AI Collective Decision Making

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Embodied LLM Agents Learn to Cooperate in Organized Teams

Limits of Large Language Models in Debating Humans

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games

Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents

Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

An Electoral Approach to Diversify LLM-based Multi-Agent Collective Decision-Making

Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Inducing Political Bias Allows Language Models Anticipate Partisan Reactions to Controversies

Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate

Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

Quantifying the Impact of Large Language Models on Collective Opinion Dynamics