Research on Generalization of Multi-agent Based on Reinforcement Learning

GUO Xin,WANG Wei,QING Wei,LI Jian,HE Zhao-feng
DOI: https://doi.org/10.3969/j.issn.1673-629x.2023.04.017
2023-01-01
Abstract:In the research of multi-agent reinforcement learning algorithm, due to the difference between training and testing environment, how to make agents intelligently learn to cope with the performance degradation caused by the change of other agents’ policy in the environment has been widely concerned by researchers. To solve this generalization problem, human-preference based multi-agent role policy ensemble is proposed, which considers the effects of long-term reward and immediate reward. This improvement enables the algorithm to determine the direction of policy updating to avoid excessive exploration and ineffective training. In addition, agents are classified into different roles according to their immediate rewards of historical actions. Thus the parameters are shared with the same-role agent, which improves efficiency and achieves the scalability of the multi-agent algorithm. The comparison with the existing algorithm in the multi-agent particle environment shows that the proposed algorithm has a faster convergence speed which can effectively train the optimal strategy, and its intelligence can better generalize to the unknown environment.
What problem does this paper attempt to address?