Abstract:In a multi-agent system (MAS), action semantics indicates the different influences of agents' actions toward other entities, and can be used to divide agents into groups in a physically heterogeneous MAS. Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents without careful discrimination of different action semantics. This common implementation decreases the cooperation and coordination between agents in complex situations. However, fully independent agent parameters dramatically increase the computational cost and training difficulty. In order to benefit from the usage of different action semantics while also maintaining a proper parameter-sharing structure, we introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is the union set of all agent actions with different semantics. All agents first calculate their unified representation in the UAS, and then generate their heterogeneous action policies using different available-action-masks. To further improve the training of extra UAS parameters, we introduce a Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the trajectory information. As a universal method for solving the physically heterogeneous MARL problem, we implement the UAS adding to both value-based and policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and U-MAPPO. Experimental results in the SMAC environment prove the effectiveness of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.

Policy Sharing Using Aggregation Trees for ${Q}$ -Learning in a Continuous State and Action Spaces

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Multiagent Soft Q-Learning

Learning in complex action spaces without policy gradients

A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint

Mixed Q-Functionals: Advancing Value-Based Methods in Cooperative MARL with Continuous Action Domains

Learning Action-Transferable Policy with Action Embedding

Tree Based Discretization for Continuous State Space Reinforcement Learning

Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space

EASpace: Enhanced Action Space for Policy Transfer

Time-Scale Separation in Q-Learning: Extending TD($\triangle$) for Action-Value Function Decomposition

Learning Multi-Agent Cooperation via Considering Actions of Teammates

Extending Q-learning to continuous and mixed strategy games based on spatial reciprocity

Comparing Action Aggregation Strategies in Deep Reinforcement Learning with Continuous Action

Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning

TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space

Learning Intuitive Policies Using Action Features

How to discretize continuous state-action spaces in Q-learning: A symbolic control approach

Quinoa: a Q-function You Infer Normalized Over Actions

QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning