Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning

SHI Dian-xi,ZHAO Chen-ran,ZHANG Yao-wen,YANG Shao-wu,ZHANG Yong-jun
DOI: https://doi.org/10.11896/jsjkx.210700100
2022-01-01
Computer Science
Abstract:At present,most multi-agent reinforcement learning(MARL) algorithms using the architecture of centralized training and decentralized execution(CTDE) have good results in homogeneous multi-agent systems.However,for heterogeneous multi-agent systems composed of different roles,there is always the problem of credit assignment,which makes it difficult for agents to learn effective cooperation strategies.To tackle the above problems,an adaptive reward method with end-to-end cooperation based on multi-agent reinforcement learning is proposed.It can promote the cooperation between agents.First,a batch regularization network is proposed.It uses a graph neural network to model the cooperative relationship of heterogeneous multi-agents.And it uses the attention mechanism to calculate the weight of key information.Also,it uses the batch regularization method to generate feature vectors.Besides,it guides the algorithm to learn in the right direction,thereby effectively improving the performance of heterogeneous multi-agent cooperative strategy generation.Second,an adaptive intrinsic reward network based on the actor-critic method is proposed.It can convert sparse rewards into dense rewards,which can guide agents to generate cooperative strategies according to the situation on the field.Through experiments,compared with the current mainstream multi-agent reinforcement learning algorithms,the proposed method has achieved significantly good results in the “cooperative-game” scenario.In addition,the visual analysis of the strategy-reward-behavior correlation further verifies the effectiveness of the proposed method.
What problem does this paper attempt to address?