Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Hongyao Tang,Jianye Hao,Tangjie Lv,Yingfeng Chen,Zongzhang Zhang,Hangtian Jia,Chunxu Ren,Yan Zheng,Zhaopeng Meng,Changjie Fan,Li Wang
DOI: https://doi.org/10.48550/arXiv.1809.09332
2019-07-04
Abstract:Multiagent reinforcement learning (MARL) is commonly considered to suffer from non-stationary environments and exponentially increasing policy space. It would be even more challenging when rewards are sparse and delayed over long trajectories. In this paper, we study hierarchical deep MARL in cooperative multiagent problems with sparse and delayed reward. With temporal abstraction, we decompose the problem into a hierarchy of different time scales and investigate how agents can learn high-level coordination based on the independent skills learned at the low level. Three hierarchical deep MARL architectures are proposed to learn hierarchical policies under different MARL paradigms. Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning. We empirically demonstrate the effectiveness of our approaches in two domains with extremely sparse feedback: (1) a variety of Multiagent Trash Collection tasks, and (2) a challenging online mobile game, i.e., Fever Basketball Defense.
Machine Learning,Artificial Intelligence,Multiagent Systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to learn effectively in cooperative multi - agent tasks with sparse and delayed rewards in multi - agent reinforcement learning (MARL). Specifically, the paper focuses on the following challenges: 1. **Non - stationary environment**: In a multi - agent system, as the learning strategies of each agent are constantly updated, the environment is non - stationary for each agent, which makes the learning process more difficult. 2. **Exponential growth of the policy space**: As the number of agents increases, the policy space grows exponentially, and traditional reinforcement learning methods are difficult to be directly applied. 3. **Sparse and delayed rewards**: In many real - world applications, the reward signals are sparse and delayed, which further increases the learning difficulty, especially for tasks that require long - term planning. To address these challenges, the paper introduces the concept of temporal abstraction and proposes a hierarchical deep multi - agent reinforcement learning (Hierarchical Deep MARL) method. Through temporal abstraction, the paper decomposes the problem into a hierarchical structure at different time scales, thereby reducing the learning difficulty. Specific contributions include: - **Proposing three hierarchical deep MARL architectures**: including hierarchical independent learner (h - IL), hierarchical communication network (h - Comm), and hierarchical Qmix network (h - Qmix), which are respectively applicable to different MARL paradigms. - **Introducing a new experience replay mechanism**: namely augmented concurrent experience replay (ACER), which alleviates the problems of sparse transitions and non - stationarity by enhancing high - level experience replay and concurrent sampling. - **Experimental verification**: Experimental verification was carried out in two environments with extremely sparse feedback, namely multi - agent trash collection tasks and an online mobile game "Fever Basketball Defense". Through these methods, the paper shows that hierarchical deep multi - agent reinforcement learning can learn cooperative strategies more effectively in sparse and delayed - reward environments.