Abstract:Multi-agent debate has proven effective in improving large language models quality for reasoning and factuality tasks. While various role-playing strategies in multi-agent debates have been explored, in terms of the communication among agents, existing approaches adopt a brute force algorithm -- each agent can communicate with all other agents. In this paper, we systematically investigate the effect of communication connectivity in multi-agent systems. Our experiments on GPT and Mistral models reveal that multi-agent debates leveraging sparse communication topology can achieve comparable or superior performance while significantly reducing computational costs. Furthermore, we extend the multi-agent debate framework to multimodal reasoning and alignment labeling tasks, showcasing its broad applicability and effectiveness. Our findings underscore the importance of communication connectivity on enhancing the efficiency and effectiveness of the "society of minds" approach.
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
This paper aims to explore how to improve the Multi-Agent Debate (MAD) framework through Sparse Communication Topology. Specifically, the paper focuses on the following issues:
1. **Reducing Computational Cost**: Existing multi-agent debate methods typically use a fully connected communication topology, where each agent can communicate with all other agents. While effective, this approach leads to significantly expanded input context as the number of agents and debate rounds increase, resulting in high computational costs. The paper proposes that by reducing the number of reference solutions visible to each agent, i.e., using a sparse communication topology, it is possible to significantly lower computational costs while maintaining or improving performance.
2. **Improving Efficiency and Effectiveness of Reasoning and Alignment Tasks**: The paper not only validates the effectiveness of sparse MAD in text reasoning tasks but also extends it to multimodal reasoning and alignment label tasks, demonstrating its broad application prospects and practical effects.
3. **Optimizing Multi-Agent Interaction**: The paper also investigates how interactions between language models (LLMs) of different strengths within the multi-agent debate framework affect overall performance. Specifically, the paper finds that assigning stronger LLMs to agents with higher centrality can significantly enhance overall performance.
### Main Contributions
1. **Effectiveness and Efficiency of Sparse Communication Topology**: The paper experimentally validates the effectiveness and efficiency of sparse communication topology in multi-agent debates, showing that sparse MAD can maintain or improve performance while reducing computational costs.
2. **Extension to Multimodal Reasoning Tasks**: The paper extends the MAD framework to multimodal reasoning tasks, demonstrating its advantages in handling tasks that combine vision and language.
3. **Application to Alignment Label Tasks**: The paper further applies MAD to alignment label tasks, validating its effectiveness in improving the performance of alignment label tasks.
4. **Explanation of Sparsity**: The paper provides insights into why sparse MAD is effective, including the impact of erroneous reference solutions and longer effective debate rounds.
5. **Optimization in Multi-LLM Settings**: The paper explores how to design communication topology to optimize overall performance in multi-LLM settings, finding that assigning stronger LLMs to agents with higher centrality can significantly enhance performance.
### Experimental Results
- **Reasoning Tasks**: On the MATH and GSM8K datasets, sparse MAD significantly reduces reasoning costs while maintaining or improving accuracy. For example, on the MATH dataset, sparse MAD (D=2/5) improved accuracy by +2% compared to fully connected MAD (D=1) while reducing reasoning costs by over 40%.
- **Multimodal Reasoning Tasks**: On the MathVista dataset, sparse MAD also performed excellently, maintaining accuracy comparable to fully connected MAD while significantly reducing reasoning costs, with a maximum reduction of 33.1% in token usage.
- **Alignment Label Tasks**: On the Anthropic-HH dataset, sparse MAD outperformed single-agent methods and self-consistency methods in both helpfulness and harmlessness tasks while significantly reducing costs.
### Conclusion
Through systematic research and experiments, the paper demonstrates that sparse communication topology can significantly improve the efficiency and effectiveness of the multi-agent debate framework, while showing broad application potential in various tasks. These findings provide new ideas and methods for building more efficient and effective multi-agent systems.