Distributed Multi-agent Soft Actor-Critic Algorithm With Probabilistic Prioritized Experience Replay

ZHANG Yanxin,KONG Han,YIN Chenkun,WANG Zihao,HUANG Zhiqing
DOI: https://doi.org/10.11936/bjutxb2022110019
2023-01-01
Abstract:Aiming at a huge demand for interaction data in practical multi-agent tasks, based on the distributed architecture in the single-intelligent domain, a multi-agent soft Actor-Critic reinforcement learning algorithm together with probabilistic prioritized experience replay and distributed architecture(DPER-MASAC) was proposed. In DPER-MASAC, workers collect experience data by interacting with environments simultaneously. To break through the limitation of purely recent experience being extracted with high probability in the case of multi-agent system of high throughput, a more universal and improved mode based on probability of priority was put forward to sample and utilize experience data, and the network parameters of agents will be updated. To verify the efficiency of DPER-MASAC, comparative experiments were conducted in two types of predator-prey environment in which both cooperation and competition exist among multiple agents. Meanwhile multi-agent soft Actor-Critic(MASAC) and multi-agent soft Actor-Critic with prioritized experience replay(PER-MASAC) were regarded as two baseline algorithms, compared with DPER-MASAC in this environment with gradually incremental-difficulty. In terms of the final performance and success rate, results indicate that the policy of predators, which is trained by DPER-MASAC, performs optimally.
What problem does this paper attempt to address?