AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

Zhiqiang Xie,Hao Kang,Ying Sheng,Tushar Krishna,Kayvon Fatahalian,Christos Kozyrakis
2024-11-06
Abstract:With more advanced natural language understanding and reasoning capabilities, large language model (LLM)-powered agents are increasingly developed in simulated environments to perform complex tasks, interact with other agents, and exhibit emergent behaviors relevant to social science and gaming. However, current multi-agent simulations frequently suffer from inefficiencies due to the limited parallelism caused by false dependencies, resulting in performance bottlenecks. In this paper, we introduce AI Metropolis, a simulation engine that improves the efficiency of LLM agent simulations by incorporating out-of-order execution scheduling. By dynamically tracking real dependencies between agents, AI Metropolis minimizes false dependencies, enhancing parallelism and enabling efficient hardware utilization. Our evaluations demonstrate that AI Metropolis achieves speedups from 1.3x to 4.15x over standard parallel simulation with global synchronization, approaching optimal performance as the number of agents increases.
Distributed, Parallel, and Cluster Computing,Artificial Intelligence,Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the efficiency issues of large language model (LLM) agents in multi-agent simulations. Specifically, current multi-agent simulations often face performance bottlenecks due to false dependencies that limit parallelism. The paper introduces a new simulation engine called AI Metropolis, which improves the efficiency of LLM agent simulations by introducing out-of-order execution scheduling. AI Metropolis dynamically tracks the actual dependencies between agents, minimizes false dependencies, enhances parallelism, and achieves efficient hardware utilization. ### Main Challenges 1. **False Dependencies**: Traditional multi-agent simulation methods enforce global synchronization, causing many agents to wait unnecessarily at each step, limiting parallelism. 2. **Unbalanced Workload**: Different agents have uneven task loads in the simulation, leading to reduced parallelism. 3. **Inference Time Dominance**: The simulation of LLM agents is mainly dominated by inference time, requiring high throughput to shorten completion time and reduce costs. 4. **Critical Path Requests**: There are long critical paths in simulation tasks that need to be prioritized to minimize overall completion time. ### Solutions AI Metropolis addresses the above issues through the following methods: 1. **Out-of-Order Execution Scheduling**: Allows agents to progress at different speeds based on their task loads, reducing frequent global synchronization and improving parallelism. 2. **Dependency Tracking**: By analyzing the spatiotemporal relationships between agents, it dynamically tracks actual dependencies, eliminating most false dependencies. 3. **Cluster Management**: Groups coupled agents into clusters, with each cluster acting as a minimal synchronization unit, reducing false dependencies and scheduling overhead. 4. **Priority Scheduling**: Prioritizes requests based on the time steps of tasks, ensuring that tasks on the critical path are completed first, further enhancing parallelism. ### Experimental Results The paper validates the effectiveness of AI Metropolis through experiments: - **Enhanced Parallelism**: AI Metropolis significantly improves parallelism by tracking actual dependencies, shortening simulation completion time. - **Scalability**: AI Metropolis demonstrates good scalability as the size of the simulated world and the number of agents increase. - **Performance Comparison**: Although AI Metropolis does not completely eliminate all false dependencies, its performance is close to the optimal solution. ### Conclusion AI Metropolis effectively addresses efficiency issues in multi-agent simulations by introducing out-of-order execution scheduling and dependency tracking, improving parallelism and hardware utilization, and demonstrating good scalability and performance advantages.