Abstract:There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent's individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents' candidate behaviors based on (noisy) observations, thus enabling learning at the agent's own level. We also address MARL's prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.

Cognition-Oriented Multiagent Reinforcement Learning

S2rl

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

S2RL: DoWe Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Consciousness-Aware Multi-Agent Reinforcement Learning

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium

Attention Enhanced Reinforcement Learning for Multi agent Cooperation

Continual Multi-Objective Reinforcement Learning Via Reward Model Rehearsal

Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning

Competitive Multi-agent Deep Reinforcement Learning with Counterfactual Thinking

Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning

Optimistic sequential multi-agent reinforcement learning with motivational communication

Attentional Policies for Cross-Context Multi-Agent Reinforcement Learning

Modeling and reinforcement learning in partially observable many-agent systems

LMRL: a Multi-Agent Reinforcement Learning Model and Algorithm

Reinforcement learning for encouraging cooperation in a multiagent system

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Modeling Sensorimotor Coordination as Multi-Agent Reinforcement Learning with Differentiable Communication

Toward a Psychology of Deep Reinforcement Learning Agents Using a Cognitive Architecture