Abstract:In large-scale metropolis, it is critical to efficiently allocate various resources such as electricity, medical care, and transportation to meet the living demands of citizens, according to the spatio-temporal distributions of resources and demands. Previous researchers have done plentiful work on such problems by leveraging Multi-Agent Reinforcement Learning (MARL) methods, where multiple agents cooperatively regulate and allocate the resources to meet the demands. However, facing the great number of agents in large cities, existing MARL methods lack efficient parameter sharing strategies among agents to reduce computational complexity. There remain two primary challenges in efficient parameter sharing: (1) during the RL training process, the behavior of agents changes significantly, limiting the performance of group parameter sharing based on fixed role division decided before training; (2) the behavior of agents forms complicated action trajectories, where their role characteristics are implicit, adding difficulty to dynamically adjusting agent role divisions during the training process. In this paper, we propose Dynamic Parameter Sharing (DyPS) to solve the above challenges. We design self-supervised learning tasks to extract the implicit behavioral characteristics from the action trajectories of agents. Based on the obtained behavioral characteristics, we propose a hierarchical MARL framework capable of dynamically revising the agent role divisions during the training process and thus shares parameters among agents with the same role, reducing computational complexity. In addition, our framework can be combined with various typical MARL algorithms, including IPPO, MAPPO, etc. We conduct 7 experiments in 4 representative resource allocation scenarios, where extensive results demonstrate our method's superior performance, outperforming the state-of-the-art baseline methods by up to 31%. Our source codes are available at https://github.com/tsinghua-fib-lab/DyPS.

Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

A Cooperative Multi-Agent Reinforcement Learning Algorithm Based on Dynamic Self-Selection Parameters Sharing

Experience Selection In Multi-Agent Deep Reinforcement Learning

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Research on Multi-Agent Task Allocation and Path Planning Based on Pri-MADDPG

Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning

Multiexperience-Assisted Efficient Multiagent Reinforcement Learning

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

Off-Agent Trust Region Policy Optimization

Experience Sharing Between Cooperative Reinforcement Learning Agents

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Multi-UAV pursuit-evasion gaming based on PSO-M3DDPG schemes

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space

DyPS: Dynamic Parameter Sharing in Multi-Agent Reinforcement Learning for Spatio-Temporal Resource Allocation

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning

Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep Reinforcement Learning

R-MADDPG for Partially Observable Environments and Limited Communication