Abstract:Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.

Timesharing-tracking: A New Framework for Decentralized Reinforcement Learning in Cooperative Multi-Agent Systems

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Reinforcement learning for encouraging cooperation in a multiagent system

Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning

Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning

Learning to Share in Multi-Agent Reinforcement Learning

A Cooperative Multi-Agent Reinforcement Learning Algorithm Based on Dynamic Self-Selection Parameters Sharing

A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Individual Reward Assisted Multi-Agent Reinforcement Learning.

Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning.

Coalition Game of Radar Network for Multitarget Tracking via Model-Based Multiagent Reinforcement Learning

Learning Multi-Agent Cooperation via Considering Actions of Teammates

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Egoism, utilitarianism and egalitarianism in multi-agent reinforcement learning

A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint

Coordinating Multi-Agent Reinforcement Learning Via Dual Collaborative Constraints