Abstract:The emergence of Large Language Models (LLMs) have fundamentally altered the way we interact with digital systems and have led to the pursuit of LLM powered AI agents to assist in daily workflows. LLMs, whilst powerful and capable of demonstrating some emergent properties, are not logical reasoners and often struggle to perform well at all sub-tasks carried out by an AI agent to plan and execute a workflow. While existing studies tackle this lack of proficiency by generalised pretraining at a huge scale or by specialised fine-tuning for tool use, we assess if a system comprising of a coalition of pretrained LLMs, each exhibiting specialised performance at individual sub-tasks, can match the performance of single model agents. The coalition of models approach showcases its potential for building robustness and reducing the operational costs of these AI agents by leveraging traits exhibited by specific models. Our findings demonstrate that fine-tuning can be mitigated by considering a coalition of pretrained models and believe that this approach can be applied to other non-agentic systems which utilise LLMs.

What problem does this paper attempt to address?

The paper primarily explores how the combined use of multiple large language models (LLMs) can enhance the robustness and efficiency of AI agents. Specifically, the researchers propose a novel approach that utilizes a "coalition" of pre-trained large language models, each demonstrating specific advantages when performing a subtask in the AI agent workflow. Compared to traditional single-model approaches, this method has the following advantages: 1. **Cost Reduction**: It avoids the high costs associated with large-scale pre-training or fine-tuning a single model for specific tasks. 2. **Increased Accuracy**: By assigning different subtasks to the models best suited for them, the overall system accuracy is improved. 3. **Enhanced Flexibility**: When new models with better performance emerge, they can be easily integrated into the existing system without retraining the entire system. ### Research Background and Objectives With the development of large language models, these models have been widely applied in various digital systems and used to build AI agents that assist with daily tasks. Although large language models are powerful and can exhibit some emergent properties, they are not logical reasoners and perform poorly in executing all the subtasks required for AI agent planning and execution workflows. To address this limitation, current research typically adopts two approaches: one is general pre-training on a massive scale; the other is specialized fine-tuning for tool usage tasks. The authors of this paper evaluate whether a coalition of multiple pre-trained large language models can achieve performance comparable to a single model without fine-tuning. ### Key Findings - **Coalition Models Outperform Single Fine-Tuned Models**: Experiments on the ToolAlpaca dataset show that a coalition of multiple pre-trained models surpasses a single fine-tuned model in overall workflow accuracy, without incurring any additional costs associated with fine-tuning. - **Different Models Excel in Specific Tasks**: The study finds that certain models perform better in specific subtasks such as planning, slot filling, and response generation. For example, the Mistral model excels in planning tasks, while the Mixtral model performs well in slot filling tasks. - **Using the Best Model for Specific Tasks Can Reduce Costs**: By selecting the model best suited for a specific task, the overall system cost can be reduced while maintaining accuracy. ### Conclusion This paper introduces an innovative approach to enhance the robustness and efficiency of LLM-based AI agents through a coalition of multiple pre-trained models. This method not only improves system accuracy but also reduces operational costs. Future research can further explore whether a coalition of multiple fine-tuned models can achieve state-of-the-art performance.

Coalitions of Large Language Models Increase the Robustness of AI Agents

Mixture-of-Agents Enhances Large Language Model Capabilities

Large Language Model Evaluation Via Multi AI Agents: Preliminary results

Organizing a Society of Language Models: Structures and Mechanisms for Enhanced Collective Intelligence

Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents

The Importance of Understanding Language in Large Language Models

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate

Modelling Political Coalition Negotiations Using LLM-based Agents

Enhancing Pipeline-Based Conversational Agents with Large Language Models

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Theory of Mind for Multi-Agent Collaboration via Large Language Models

Large Language Models with Controllable Working Memory

Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards

Synergistic Integration of Large Language Models and Cognitive Architectures for Robust AI: An Exploratory Analysis

A Survey of Large Language Models

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Supervised Knowledge Makes Large Language Models Better In-context Learners

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration