Improving Sample Efficiency of Multiagent Reinforcement Learning with Nonexpert Policy for Flocking Control.

Yunbo Qiu,Yue Jin,Lebin Yu,Jian Wang,Yu Wang,Xudong Zhang
DOI: https://doi.org/10.1109/jiot.2023.3240671
IF: 10.6
2023-01-01
IEEE Internet of Things Journal
Abstract:Control algorithms of a multiagent system (MAS) have been applied to many Internet of Things devices, such as unmanned aerial vehicles and autonomous underwater vehicles. Flocking control is a crucial problem in MAS to enhance the safety and cooperativity of agents, which requires the agents to maintain the flock when navigating to a target position and avoiding collisions. In comparison with the traditional algorithms, methods based on multiagent reinforcement learning (MARL) can solve the problem of flocking control more flexibly and adapt to more complex environments. However, the MARL-based methods demand a huge number of interactions between agents and the environment, resulting in the problem of sample inefficiency. In this article, we propose nonexpert policy-aided MARL (NPA-MARL) to improve sample efficiency, which utilizes a fundamental MARL algorithm and a prior policy whose performance can be nonexpert. Before online MARL training, NPA-MARL generates demonstrations by the nonexpert policy to pretrain agents, while preventing overfitting demonstrations. During online training, NPA-MARL instructs agents to imitate the nonexpert policy if the nonexpert policy is better in agents' recognition. We leverage NPA-MARL to solve the problem of flocking control. Experimental results show that NPA-MARL improves sample efficiency and policy performance in flocking control. Besides, NPA-MARL has the scalability of more agents and the flexibility of choice of the nonexpert policy and a fundamental MARL algorithm.
What problem does this paper attempt to address?