Abstract:Many real-world cooperative problems can be implemented using Multi-Agent Reinforcement Learning (MARL) techniques, such as urban traffic control or multi-role games. However, the policy learning of MARL algorithms contains features of long-trajectory training and partial observability, which leads to the sparsity of reward and the lack of decision information. To solve the above issues, this article studies hierarchical deep MARL and proposes a novel model named Hierarchical Spatio-Temporal Communication Network (HSTCN). HSTCN designs hierarchical policies with two-time granularities: high-level and low-level policies. All agents are jointly entered into a joint policy containing the above two policies, and each has its execution policy. Specifically, the high-level policy provides intrinsic goals and continuous reward samples for the low-level policy to alleviate reward sparsity. The Low-level policy absorbs the above information to improve the efficiency of the agents' execution policies and interact with the environment to optimize the next reward. What's more, the high-level policy designs a graph-like structural model with Spatio-Temporal abstract. The Spatio-Temporal model expands receptive fields to receive neighborhood information and facilitates learning more robust policies by capturing the underlying graph's spatial dependencies and temporal dynamics. Meanwhile, an evaluation network is added to increase the robustness. Empirically, we demonstrated the effectiveness of HSTCN in a long-trajectory training environment through Simulation of Urban MObility (SUMO), while StarCraft II maps are tested as abstract environment. The above experimental results prove that the performance of HSTCN is superior to other advanced algorithms and verify the rationality of HSTCN design.

Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning

Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement Learning

STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning.

S2rl

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning

Priority over Quantity: A Self-Incentive Credit Assignment Scheme for Cooperative Multiagent Reinforcement Learning

Individual Reward Assisted Multi-Agent Reinforcement Learning.

Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization

Selective Learning for Sample-Efficient Training in Multi-Agent Sparse Reward Tasks

Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning

Hierarchical Coordination Multi-Agent Reinforcement Learning with Spatio-Temporal Abstraction

Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning

Credit assignment with predictive contribution measurement in multi-agent reinforcement learning

Credit assignment in heterogeneous multi-agent reinforcement learning for fully cooperative tasks

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework

Progressive Diversifying Policy for Multi-Agent Reinforcement Learning

Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles