Coalitions of Large Language Models Increase the Robustness of AI Agents

Prattyush Mangal,Carol Mak,Theo Kanakis,Timothy Donovan,Dave Braines,Edward Pyzer-Knapp
2024-08-03
Abstract:The emergence of Large Language Models (LLMs) have fundamentally altered the way we interact with digital systems and have led to the pursuit of LLM powered AI agents to assist in daily workflows. LLMs, whilst powerful and capable of demonstrating some emergent properties, are not logical reasoners and often struggle to perform well at all sub-tasks carried out by an AI agent to plan and execute a workflow. While existing studies tackle this lack of proficiency by generalised pretraining at a huge scale or by specialised fine-tuning for tool use, we assess if a system comprising of a coalition of pretrained LLMs, each exhibiting specialised performance at individual sub-tasks, can match the performance of single model agents. The coalition of models approach showcases its potential for building robustness and reducing the operational costs of these AI agents by leveraging traits exhibited by specific models. Our findings demonstrate that fine-tuning can be mitigated by considering a coalition of pretrained models and believe that this approach can be applied to other non-agentic systems which utilise LLMs.
Computation and Language
What problem does this paper attempt to address?
The paper primarily explores how the combined use of multiple large language models (LLMs) can enhance the robustness and efficiency of AI agents. Specifically, the researchers propose a novel approach that utilizes a "coalition" of pre-trained large language models, each demonstrating specific advantages when performing a subtask in the AI agent workflow. Compared to traditional single-model approaches, this method has the following advantages: 1. **Cost Reduction**: It avoids the high costs associated with large-scale pre-training or fine-tuning a single model for specific tasks. 2. **Increased Accuracy**: By assigning different subtasks to the models best suited for them, the overall system accuracy is improved. 3. **Enhanced Flexibility**: When new models with better performance emerge, they can be easily integrated into the existing system without retraining the entire system. ### Research Background and Objectives With the development of large language models, these models have been widely applied in various digital systems and used to build AI agents that assist with daily tasks. Although large language models are powerful and can exhibit some emergent properties, they are not logical reasoners and perform poorly in executing all the subtasks required for AI agent planning and execution workflows. To address this limitation, current research typically adopts two approaches: one is general pre-training on a massive scale; the other is specialized fine-tuning for tool usage tasks. The authors of this paper evaluate whether a coalition of multiple pre-trained large language models can achieve performance comparable to a single model without fine-tuning. ### Key Findings - **Coalition Models Outperform Single Fine-Tuned Models**: Experiments on the ToolAlpaca dataset show that a coalition of multiple pre-trained models surpasses a single fine-tuned model in overall workflow accuracy, without incurring any additional costs associated with fine-tuning. - **Different Models Excel in Specific Tasks**: The study finds that certain models perform better in specific subtasks such as planning, slot filling, and response generation. For example, the Mistral model excels in planning tasks, while the Mixtral model performs well in slot filling tasks. - **Using the Best Model for Specific Tasks Can Reduce Costs**: By selecting the model best suited for a specific task, the overall system cost can be reduced while maintaining accuracy. ### Conclusion This paper introduces an innovative approach to enhance the robustness and efficiency of LLM-based AI agents through a coalition of multiple pre-trained models. This method not only improves system accuracy but also reduces operational costs. Future research can further explore whether a coalition of multiple fine-tuned models can achieve state-of-the-art performance.