Imitation Learning based Alternative Multi-Agent Proximal Policy Optimization for Well-Formed Swarm-Oriented Pursuit Avoidance
Sizhao Li,Yuming Xiang,Rongpeng Li,Zhifeng Zhao,Honggang Zhang
2023-11-06
Abstract:Multi-Robot System (MRS) has garnered widespread research interest and fostered tremendous interesting applications, especially in cooperative control fields. Yet little light has been shed on the compound ability of formation, monitoring and defence in decentralized large-scale MRS for pursuit avoidance, which puts stringent requirements on the capability of coordination and adaptability. In this paper, we put forward a decentralized Imitation learning based Alternative Multi-Agent Proximal Policy Optimization (IA-MAPPO) algorithm to provide a flexible and communication-economic solution to execute the pursuit avoidance task in well-formed swarm. In particular, a policy-distillation based MAPPO executor is firstly devised to capably accomplish and swiftly switch between multiple formations in a centralized manner. Furthermore, we utilize imitation learning to decentralize the formation controller, so as to reduce the communication overheads and enhance the scalability. Afterwards, alternative training is leveraged to compensate the performance loss incurred by decentralization. The simulation results validate the effectiveness of IA-MAPPO and extensive ablation experiments further show the performance comparable to a centralized solution with significant decrease in communication overheads.
Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve flexible and communication - economical formation, monitoring, and pursuit - evasion tasks in multi - robot systems (MRS). Specifically, the researchers focus on how to effectively coordinate and adaptively complete pursuit - evasion tasks in large decentralized multi - robot systems while maintaining the flexibility and defense capabilities of the formation. This challenge imposes strict requirements on the coordination and adaptability of the system, especially when it is necessary to quickly switch formation modes to deal with different situations.
The paper proposes an Imitation - Learning - based Alternating Multi - Agent Proximal Policy Optimization algorithm (IA - MAPPO), aiming to provide a solution that is both flexible and saves communication resources for performing well - organized group pursuit - evasion tasks. Through this method, the researchers hope to maintain performance comparable to that of centralized solutions while reducing communication overhead.
In summary, this paper mainly solves the following problems:
1. **Formation flexibility**: How to achieve rapid switching between multiple formation modes in multi - robot systems to adapt to different task requirements.
2. **Communication efficiency**: How to reduce communication overhead and improve the scalability of the system in decentralized multi - robot systems.
3. **Coordination and adaptability**: How, through an effective coordination mechanism, can multi - robot systems adapt to environmental changes and maintain the stability and security of the formation during pursuit - evasion tasks.