Towards Efficient Multi-Agent Learning Systems

Kailash Gogineni,Peng Wei,Tian Lan,Guru Venkataramani
2023-05-24
Abstract:Multi-Agent Reinforcement Learning (MARL) is an increasingly important research field that can model and control multiple large-scale autonomous systems. Despite its achievements, existing multi-agent learning methods typically involve expensive computations in terms of training time and power arising from large observation-action space and a huge number of training steps. Therefore, a key challenge is understanding and characterizing the computationally intensive functions in several popular classes of MARL algorithms during their training phases. Our preliminary experiments reveal new insights into the key modules of MARL algorithms that limit the adoption of MARL in real-world systems. We explore neighbor sampling strategy to improve cache locality and observe performance improvement ranging from 26.66% (3 agents) to 27.39% (12 agents) during the computationally intensive mini-batch sampling phase. Additionally, we demonstrate that improving the locality leads to an end-to-end training time reduction of 10.2% (for 12 agents) compared to existing multi-agent algorithms without significant degradation in the mean reward.
Multiagent Systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in multi - agent reinforcement learning (MARL), as the number of agents increases, the training process becomes extremely time - consuming and computationally expensive. Specifically, existing multi - agent learning methods are usually very expensive in terms of training time and power consumption, mainly due to the huge observation - action space and a large number of training steps. Therefore, the key challenge of the paper lies in understanding and characterizing the computationally intensive functions of several popular MARL algorithms in the training phase. To meet this challenge, the authors conducted a workload characterization study to understand the performance - limiting functions in several well - known model - free MARL frameworks. These frameworks use the actor - critic method, and the state space is usually very large. By analyzing different MARL training phases, the authors found some key modules that limit the application of MARL in practical systems. For example, in the sampling phase, the cache locality is improved through the neighborhood sampling strategy, thus increasing the performance by 26.66% to 27.39% (for 3 to 12 agents). In addition, by improving locality, the end - to - end training time is reduced by 10.2% (for 12 agents) compared with the existing multi - agent algorithms, while the average reward does not decrease significantly. Overall, the main contribution of the paper is to systematically conduct a hardware - software performance analysis and propose the key performance bottlenecks in the training phase of multi - agent systems. At the same time, the authors explored a neighborhood sampling strategy to improve the locality of data access, thereby significantly improving the training efficiency.