Abstract:Multi-agent debate system (MAD) imitating the process of human discussion in pursuit of truth, aims to align the correct cognition of different agents for the optimal solution. It is challenging to make various agents perform right and highly consistent cognition due to their limited and different knowledge backgrounds (i.e., cognitive islands), which hinders the search for the optimal solution. To address the challenge, we propose a novel \underline{M}ulti-\underline{A}gent \underline{D}ebate with \underline{K}nowledge-\underline{E}nhanced framework (\textbf{MADKE}) to promote the system to find the solution. First, we involve a shared retrieval knowledge pool in the debate process to solve the problem of limited and different knowledge backgrounds. Then, we propose an adaptive knowledge selection method to guarantee the accuracy and personalization of knowledge. This method allows agents to choose whether to use external knowledge in each conversation round according to their own needs. Our experimental results on six datasets show that our method achieves state-of-the-art results compared to existing single-agent and multi-agent methods. Further analysis reveals that the introduction of retrieval knowledge can help the agent to break cognitive islands in the debate process and effectively improve the consistency and correctness of the model. Moreover, MADKE using Qwen1.5-72B-Chat surpasses GPT-4 by +1.26\% on average in six datasets, which validates that our method can help open-source LLMs achieve or even surpass the performance of GPT-4. Our code is available at \url{<a class="link-external link-https" href="https://github.com/FutureForMe/MADKE" rel="external noopener nofollow">this https URL</a>}.

M-MAD: Multidimensional Multi-Agent Debate Framework for Fine-grained Machine Translation Evaluation

MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria

Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level

Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

Emphasising Structured Information: Integrating Abstract Meaning Representation into LLMs for Enhanced Open-Domain Dialogue Evaluation

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation