Abstract:One of the main challenges in reinforcement learning is how an agent explores the environment with sparse rewards to learn the optimal policy. Although many option discovery methods have been proposed to improve exploration in sparse-reward domains, it is still an open question how to accelerate exploration in a near-optimal manner. Recently, covering options was proposed to find a set of options that reduce the expected cover time of environment—the expected number of steps required to visit every state in the environment. Specifically, covering options constructs options by eigenvectors of the graph Laplacian matrix to minimize the environment’s expected cover time. However, calculating the whole graph Laplacian matrix directly has high computational time complexity usually, especially for a large sparse graph, so this method does not well solve the problem of accelerating exploration in sparse-reward domains. In this paper, we propose a new option discovery method, Min Degree and Max Distance (MDMD) options, to accelerate exploration in sparse-reward domains by reducing the expected cover time of the environment. Specifically, our algorithm heuristically selects state transition matrix’s two nonadjacent vertices with the minimum degree and the maximum distance as options. The generated options can provably reduce the environment’s expected cover time by using the transition function learned by the agent. Without calculating the graph Laplacian matrix and its eigenvectors, our method can accelerate exploration in sparse-reward domains. In six challenging sparse-reward environments, experimental results show that our approach significantly accelerates exploration and thus obtains a higher total cumulative reward than other option discovery methods.

Learning Multiagent Options for Tabular Reinforcement Learning Using Factor Graphs

Learning Multi-agent Skills for Tabular Reinforcement Learning using Factor Graphs

Multi-agent Covering Option Discovery through Kronecker Product of Factor Graphs.

Multi-agent Deep Covering Option Discovery

Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Option-based Multi-agent Exploration

Algorithm for Automatic Constructing Option Based on Multi-Agent

MDMD Options Discovery for Accelerating Exploration in Sparse-Reward Domains

Option-Critic in Cooperative Multi-agent Systems

Multi-Level Discovery of Deep Options

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

An agent with a sense of direction for option discovery in hierarchical reinforcement learning

Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning

Collaborative Multi-Agent Reinforcement Learning Based on a Novel Coordination Tree Frame with Dynamic Partition

Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning

Learning when to Transfer among Agents: an Efficient Multiagent Transfer Learning Framework.

Learning to explore by reinforcement over high-level options

Unveiling Options with Neural Decomposition

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization.

A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning.