Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Johannes Ackermann,Volker Gabler,Takayuki Osa,Masashi Sugiyama

DOI: https://doi.org/10.48550/arXiv.1910.01465

2019-12-03

Abstract:Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus investigate the presence of a common weakness in single-agent RL, namely value function overestimation bias, in the multi-agent setting. Based on our findings, we propose an approach that reduces this bias by using double centralized critics. We evaluate it on six mixed cooperative-competitive tasks, showing a significant advantage over current methods. Finally, we investigate the application of multi-agent methods to high-dimensional robotic tasks and show that our approach can be used to learn decentralized policies in this domain.

Machine Learning,Artificial Intelligence,Multiagent Systems

What problem does this paper attempt to address?

This paper attempts to solve the over - estimation bias problem of the value function in multi - agent reinforcement learning (MARL). Specifically, the author observes the phenomenon of over - estimation bias of the value function in single - agent reinforcement learning and speculates that this phenomenon may also exist in multi - agent environments. To verify this, they use the popular multi - agent deep deterministic policy gradient method (MADDPG) as the research object and find through experiments that MADDPG does have the problem of over - estimation bias. Based on this finding, the author proposes a new method - multi - agent twin delayed deep deterministic policy gradient (MATD3), aiming to reduce this over - estimation bias. The MATD3 method achieves this by introducing two centralized critic networks. One network is used to evaluate the quality of the current policy, while the other network is used to update the parameters of the first network, thus avoiding the over - estimation problem that may be generated by a single network. In addition, MATD3 also adopts techniques such as delayed policy update and target policy smoothing to further improve the stability and efficiency of learning. The paper proves that MATD3 has significant advantages over existing methods through evaluation on six mixed cooperation - competition tasks. At the same time, the author also applies this method to high - dimensional robot tasks, showing its potential in learning decentralized control strategies, indicating that MATD3 can not only effectively reduce over - estimation bias, but also achieve better performance in complex multi - agent environments.

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Decentralized multi-agent reinforcement learning based on best-response policies

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

The challenge of redundancy on multi-agent value factorisation

On Centralized Critics in Multi-Agent Reinforcement Learning

Cooperative and Competitive Biases for Multi-Agent Reinforcement Learning

Efficient Continuous Control with Double Actors and Regularized Critics

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

An Overestimation Reduction Method Based on the Multi-step Weighted Double Estimation Using Value-Decomposition Multi-agent Reinforcement Learning

Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning

Bi-Level Actor-Critic for Multi-Agent Coordination.

Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning

Self-attention-based multi-agent continuous control method in cooperative environments

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

A Survey and Critique of Multiagent Deep Reinforcement Learning

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

From Centralized to Self-Supervised: Pursuing Realistic Multi-Agent Reinforcement Learning