CIExplore: Curiosity and Influence-based Exploration in Multi-Agent Cooperative Scenarios with Sparse Rewards

Huanhuan Yang,Dianxi Shi,Chenran Zhao,Guojun Xie,Shaowu Yang
DOI: https://doi.org/10.1145/3459637.3482326
2021-01-01
Abstract:Learning in a sparse-reward setting is a well-known challenge in RL (Reinforcement Learning). In the single-agent domain, this challenge can be addressed by introducing exploration bonuses driven by intrinsic motivation to encourage agents to visit unseen states. However, naively applying these methods in MARL (Multi-Agent Reinforcement Learning) cooperative settings with sparse rewards results in some inevitable problems: misunderstanding environmental knowledge and lack of collaboration among agents, etc. Based on this, in this paper, we propose the Curiosity and Influence-based Explore (CIExplore) method, which includes a new form of intrinsic reward and an internal counterfactual advantage function. Concretely, the intrinsic reward is a combination of joint curiosity reward and influence reward. The former is the variance of outputs across an ensemble of prediction models that take joint observations and actions of all agents as inputs to predict the next time's joint observations. And the latter quantifies the influence of one agent's behavior on other agents' state-value functions. Given that the joint curiosity reward is shared by all agents, we compute an internal counterfactual advantage function to address this intrinsic reward assignment problem. We demonstrate the efficacy of CIExplore in the multi-agent grid-world environments and show that it is compatible with both on-policy and off-policy MARL algorithms and be scalable to complex settings where agents' number or environment randomness increases.
What problem does this paper attempt to address?