Abstract:Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents. Each agent takes all the outputs from agents in the previous layer as auxiliary information in generating its response. MoA models achieves state-of-art performance on AlpacaEval 2.0, MT-Bench and FLASK, surpassing GPT-4 Omni. For example, our MoA using only open-source LLMs is the leader of AlpacaEval 2.0 by a substantial gap, achieving a score of 65.1% compared to 57.5% by GPT-4 Omni.

What problem does this paper attempt to address?

This paper proposes a method called Mixture-of-Agents (MoA) to address the problem of effectively utilizing the collective abilities of multiple large language models (LLMs). With the advancement of LLMs in natural language understanding and generation tasks, integrating the expertise of these models has become a challenge. The paper found that even if the quality of outputs from other models is lower, an LLM often produces better responses after considering these outputs, which is referred to as the collaboration of LLMs. The MoA method achieves state-of-the-art performance on benchmark tests such as AlpacaEval 2.0, MT-Bench, and FLASK by constructing a multi-layered structure, where each layer consists of multiple LLM agents that use the outputs of all agents in the previous layer as auxiliary information to generate their responses. For example, using only open-source LLMs, MoA achieves a score of 65.1% on AlpacaEval 2.0, surpassing GPT-4 Omni's score of 57.5%. The paper also emphasizes the importance of selecting LLMs with diversity to promote collaboration and improve the overall response quality. Selection criteria include performance metrics and output diversity. By combining these factors, MoA mitigates the deficiencies of individual models and enhances the overall response quality. Experimental results demonstrate the excellent performance of MoA on multiple benchmark tests, showcasing its effectiveness and potential advantages, especially in improving the reasoning and language generation capabilities of LLMs. Furthermore, the paper also explores the connection between MoA and the Mixture-of-Experts method, but MoA operates at the model level rather than the activation level, allowing it to leverage the interfaces of existing LLMs without internal modifications.

Mixture-of-Agents Enhances Large Language Model Capabilities

SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents

MoA is All You Need: Building LLM Research Team using Mixture of Agents

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Large Language Model Evaluation Via Multi AI Agents: Preliminary results

Large Multimodal Agents: A Survey

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Coalitions of Large Language Models Increase the Robustness of AI Agents

Can LLM-Augmented autonomous agents cooperate?, An evaluation of their cooperative capabilities through Melting Pot

Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates

More Agents Is All You Need

Building Cooperative Embodied Agents Modularly with Large Language Models

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents

Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

AgentBench: Evaluating LLMs as Agents

Theory of Mind for Multi-Agent Collaboration via Large Language Models

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning