Abstract:Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.

Multiagent Continual Coordination via Progressive Task Contextualization

Multi-agent Continual Coordination Via Progressive Task Contextualization

Multi-Agent Concentrative Coordination with Decentralized Task Representation

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

Coordination as inference in multi-agent reinforcement learning

A Cooperative Multi-Agent Reinforcement Learning Method Based on Coordination Degree

Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Towards Efficient Multi-Agent Learning Systems

Multiagent Q-learning with Sub-Team Coordination.

Consciousness-Aware Multi-Agent Reinforcement Learning

SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning

Optimistic sequential multi-agent reinforcement learning with motivational communication

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under Partial Observability

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

Learning to Coordinate with Anyone