Friend-or-Foe Deep Deterministic Policy Gradient

Hao Jiang,Dianxi Shi,Chao Xue,Yajie Wang,Gongju Wang,Yongjun Zhang
DOI: https://doi.org/10.1109/smc42975.2020.9283033
2020-01-01
Abstract:One of the toughest challenges in the multi-agent deep reinforcement learning (MADRL) is that when the opponents' policies change rapidly, the collaborative agents can't learn well to respond to the opponents' policies effectively. This may lead to a local optimum w.r.t. the learned policy of the collaborative agents may be only locally optimal to the opponents' current policies. To address this problem, we propose a novel algorithm termed Friend-or-Foe Deep Deterministic Policy Gradient (FD2PG), in which the cooperative agents can be trained more robust and have stronger cooperation ability in continuous action space. These collaborative agents can generalize easily and respond correctly, even if their opponents' policies alter. Inspired by the classic Friend-or-Foe Q-learning algorithm (FFQ), we introduce the idea of minimizing the foes and maximizing the friends into the centralized training distributed execution framework, multi-agent deep deterministic policy gradient algorithm (MADDPG), to enhance collaborative agents' robustness and cooperativity. Besides, we introduce a Minimax Multi-Agent Learning (MMAL) method to explore two special equilibriums (the adversarial equilibrium and the coordination equilibrium), which can guarantee the convergence of FD2PG and improve optimization. Extensive fine-grained experiments, including four representative scenario experiments and two scale-performance correlation experiments, were conducted to demonstrate the superior performance of FD2PG comparing with existing baselines.
What problem does this paper attempt to address?