Abstract:In large-scale metropolis, it is critical to efficiently allocate various resources such as electricity, medical care, and transportation to meet the living demands of citizens, according to the spatio-temporal distributions of resources and demands. Previous researchers have done plentiful work on such problems by leveraging Multi-Agent Reinforcement Learning (MARL) methods, where multiple agents cooperatively regulate and allocate the resources to meet the demands. However, facing the great number of agents in large cities, existing MARL methods lack efficient parameter sharing strategies among agents to reduce computational complexity. There remain two primary challenges in efficient parameter sharing: (1) during the RL training process, the behavior of agents changes significantly, limiting the performance of group parameter sharing based on fixed role division decided before training; (2) the behavior of agents forms complicated action trajectories, where their role characteristics are implicit, adding difficulty to dynamically adjusting agent role divisions during the training process. In this paper, we propose Dynamic Parameter Sharing (DyPS) to solve the above challenges. We design self-supervised learning tasks to extract the implicit behavioral characteristics from the action trajectories of agents. Based on the obtained behavioral characteristics, we propose a hierarchical MARL framework capable of dynamically revising the agent role divisions during the training process and thus shares parameters among agents with the same role, reducing computational complexity. In addition, our framework can be combined with various typical MARL algorithms, including IPPO, MAPPO, etc. We conduct 7 experiments in 4 representative resource allocation scenarios, where extensive results demonstrate our method's superior performance, outperforming the state-of-the-art baseline methods by up to 31%. Our source codes are available at https://github.com/tsinghua-fib-lab/DyPS.

A Cooperative Multi-Agent Reinforcement Learning Algorithm Based on Dynamic Self-Selection Parameters Sharing

Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

DyPS: Dynamic Parameter Sharing in Multi-Agent Reinforcement Learning for Spatio-Temporal Resource Allocation

Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep Reinforcement Learning

Multi-Agent Reinforcement Learning and Genetic Policy Sharing

Individual Reward Assisted Multi-Agent Reinforcement Learning.

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

ADMN: Agent-Driven Modular Network for Dynamic Parameter Sharing in Cooperative Multi-Agent Reinforcement Learning

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Intention Propagation for Multi-agent Reinforcement Learning

Stabilizing Multi-Agent Deep Reinforcement Learning by Implicitly Estimating Other Agents’ Behaviors

Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning

Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning

Multi-agent cooperation through learning-aware policy gradients

Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

Off-Agent Trust Region Policy Optimization