Abstract:In an era where single large language models have dominated the landscape of artificial intelligence for years, multi-agent systems arise as new protagonists in conversational task-solving. While previous studies have showcased their potential in reasoning tasks and creative endeavors, an analysis of their limitations concerning the conversational paradigms and the impact of individual agents is missing. It remains unascertained how multi-agent discussions perform across tasks of varying complexity and how the structure of these conversations influences the process. To fill that gap, this work systematically evaluates multi-agent systems across various discussion paradigms, assessing their strengths and weaknesses in both generative tasks and question-answering tasks. Alongside the experiments, I propose a taxonomy of 20 multi-agent research studies from 2022 to 2024, followed by the introduction of a framework for deploying multi-agent LLMs in conversational task-solving. I demonstrate that while multi-agent systems excel in complex reasoning tasks, outperforming a single model by leveraging expert personas, they fail on basic tasks. Concretely, I identify three challenges that arise: 1) While longer discussions enhance reasoning, agents fail to maintain conformity to strict task requirements, which leads to problem drift, making shorter conversations more effective for basic tasks. 2) Prolonged discussions risk alignment collapse, raising new safety concerns for these systems. 3) I showcase discussion monopolization through long generations, posing the problem of fairness in decision-making for tasks like summarization. This work uncovers both the potential and challenges that arise with multi-agent interaction and varying conversational paradigms, providing insights into how future research could improve the efficiency, performance, and safety of multi-agent LLMs.

ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs

Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate

Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates

Improving Multi-Agent Debate with Sparse Communication Topology

MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate

Combating Adversarial Attacks with Multi-Agent Debate

A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning

GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion

Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks

Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System

Multi-Agent Large Language Models for Conversational Task-Solving

Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

DEBATE: Devil's Advocate-Based Assessment and Text Evaluation

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration