Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?

Qineng Wang,Zihao Wang,Ying Su,Hanghang Tong,Yangqiu Song
2024-02-28
Abstract:Recent progress in LLMs discussion suggests that multi-agent discussion improves the reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic experiments, where we propose a novel group discussion framework to enrich the set of discussion mechanisms. Interestingly, our results show that a single-agent LLM with strong prompts can achieve almost the same performance as the best existing discussion approach on a wide range of reasoning tasks and backbone LLMs. We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt. Further study reveals the common interaction mechanisms of LLMs during the discussion.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores the effectiveness of Multi-Agent Discussion Frameworks in enhancing the reasoning capabilities of Large Language Models (LLMs) and re-evaluates this claim through systematic experiments. **Specific issues include:** 1. **Performance comparison between single-agent and multi-agent discussions**: The paper finds that, given sufficiently strong prompts, the performance of a single-agent LLM can almost match that of the best existing multi-agent discussion methods. Particularly, in the presence of demonstration examples, the single-agent's performance even surpasses that of multi-agent discussions. 2. **Conditions under which multi-agent discussions have advantages**: The study reveals that when no demonstration examples are provided, multi-agent discussions outperform single-agent ones. Additionally, weaker LLMs (such as Bard) show improved performance when interacting with stronger LLMs (such as Gemini Pro). 3. **Common error types in discussions**: The paper also analyzes two common types of errors that may occur in multi-agent discussions: Judge Mistake and Wrong Answer Propagation. In summary, the paper aims to re-evaluate the role of multi-agent discussion frameworks in enhancing the reasoning capabilities of LLMs and provides new insights, especially regarding the performance differences between single-agent and multi-agent discussions under different prompt conditions.