Abstract:Modern large language models (LLMs) like ChatGPT have shown remarkable performance on general language tasks but still struggle on complex reasoning tasks, which drives the research on cognitive behaviors of LLMs to explore human-like problem-solving strategies. Along this direction, one representative strategy is self-reflection, which asks an LLM to refine the solution with the feedback generated by itself iteratively. However, our study shows that such reflection-style methods suffer from the Degeneration-of-Thought (DoT) problem: once the LLM has established confidence in its solutions, it is unable to generate novel thoughts later through reflection even if its initial stance is incorrect. To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Clearly, our MAD framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation. Experiment results on two challenging datasets, commonsense machine translation and counter-intuitive arithmetic reasoning, demonstrate the effectiveness of our MAD framework. Extensive analyses suggest that the adaptive break of debate and the modest level of "tit for tat" state are required for MAD to obtain good performance. Moreover, we find that LLMs might not be a fair judge if different LLMs are used for agents. Code is available at <a class="link-external link-https" href="https://github.com/Skytliang/Multi-Agents-Debate" rel="external noopener nofollow">this https URL</a>.

Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning

Rational Decision-Making Agent with Internalized Utility Judgment

BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Can Language Representation Models Think in Bets?

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Enhance Reasoning for Large Language Models in the Game Werewolf

Large Language Model As Autonomous Decision Maker

Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

Conceptual and Unbiased Reasoning in Language Models

K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning

Enhancing Language Model Reasoning via Weighted Reasoning in Self-Consistency

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Strategic Reasoning with Language Models

DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning

Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks

Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining

Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming

Learning to Reason via Self-Iterative Process Feedback for Small Language Models

Building Decision Making Models Through Language Model Regime

Deliberating with AI: Improving Decision-Making for the Future through Participatory AI Design and Stakeholder Deliberation