Abstract:We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that leverages a measure of uncertainty in reaching states to adaptively target goals that are neither too difficult nor too easy. We show how AdaGoal can be used to tackle the objective of learning an $\epsilon$-optimal goal-conditioned policy for the (initially unknown) set of goal states that are reachable within $L$ steps in expectation from a reference state $s_0$ in a reward-free Markov decision process. In the tabular case with $S$ states and $A$ actions, our algorithm requires $\tilde{O}(L^3 S A \epsilon^{-2})$ exploration steps, which is nearly minimax optimal. We also readily instantiate AdaGoal in linear mixture Markov decision processes, yielding the first goal-oriented PAC guarantee with linear function approximation. Beyond its strong theoretical guarantees, we anchor AdaGoal in goal-conditioned deep reinforcement learning, both conceptually and empirically, by connecting its idea of selecting "uncertain" goals to maximizing value ensemble disagreement.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently explore multi - goal environments and learn an approximately optimal goal - conditioned policy in unsupervised goal - conditioned reinforcement learning (GC - RL). Specifically, the paper proposes a novel goal - selection scheme named AdaGoal, which aims to optimize the exploration process by adaptively selecting goal states with intermediate difficulty. This helps the algorithm effectively learn how to reach a series of unknown goal states in the environment without relying on external reward signals. ### Main contributions of the paper: 1. **Formalize the multi - goal exploration (MGE) objective**: Minimize the number of exploration steps (i.e., sample complexity) to learn a goal - conditioned policy that is nearly optimal for all goal states reachable from the initial state within the expected number of steps. 2. **Introduce AdaGoal**: A new goal - selection scheme that depends on a simple optimization problem and adaptively targets goal states with intermediate difficulty. It also provides an algorithm - stopping rule and a set of candidate goal states that the agent is confident it can reliably reach. 3. **Design AdaGoal - UCBVI**: Implement AdaGoal in tabular Markov decision processes and prove that its sample complexity is nearly optimal. 4. **Design AdaGoal - UCRL·VTR**: Implement AdaGoal in linear - mixture Markov decision processes, which is the first method with goal - oriented PAC guarantees for linear function approximation. 5. **Application in deep GC - RL**: By connecting the idea of selecting "uncertain" goals with a practical approximation method for maximizing the difference in value sets, apply the concept and empirical study of AdaGoal to deep goal - conditioned reinforcement learning. ### Mathematical formulation of the core problem: - **Definition 1**: For any policy $\pi$ and a pair of states $(s, s')$, $V^\pi(s \to s')$ represents the expected number of steps to reach $s'$ from $s$ by executing policy $\pi$. - **Definition 2**: For any threshold $L \geq 1$, if $V^\star(s_0 \to g) \leq L$, then the goal state $g$ is said to be reliably $L$-reachable, denoted as $G_L$. - **Definition 4**: A multi - goal exploration (MGE) algorithm is called $(\epsilon, \delta, L, G)$-PAC if it stops in polynomial time and returns a set of goal states $X$ and a set of policies $\{\hat{\pi}_g\}_{g \in X}$ such that: - $ \forall g \in X, V^{\hat{\pi}_g}(s_0 \to g) - V^\star(s_0 \to g) \leq \epsilon $ - $ G_L \subseteq X \subseteq G_{L+\epsilon} $ ### Key assumptions and conclusions: - **Assumption 3**: The action space contains a known reset action $a_{\text{reset}}$ such that executing $a_{\text{reset}}$ from any state $s$ will return to the initial state $s_0$. - **Lemma 5**: MGE can be solved in polynomial time, while MGE without a reset action requires exponential time. - **Lemma 6**: For any $(\epsilon, \delta, L, G)$-PAC MGE algorithm, there exists an MDP and a goal space such that the algorithm requires at least $\Omega(L^3 SA \epsilon^{-2})$ steps to stop. - **Theorem 8**: AdaGoal - UCBVI is $(\epsilon, \delta, L, S)$-PAC, and its sample complexity is \(\til

Adaptive Multi-Goal Exploration

Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

Dynamic Subgoal-based Exploration via Bayesian Optimization

Planning Goals for Exploration

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

Option-based Multi-agent Exploration

An agent design with goal reaching guarantees for enhancement of learning

Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning

Goal-conditioned Offline Planning from Curious Exploration

Go-Explore: a New Approach for Hard-Exploration Problems

An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

Never Give Up: Learning Directed Exploration Strategies

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

GOAL: A Generalist Combinatorial Optimization Agent Learning