Abstract:Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. PEG first chooses goal commands such that the agent's goal-conditioned policy, at its current level of training, will end up in states with high exploration potential. It then launches an exploration policy starting at those promising states. To enable this direct optimization, PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands". In challenging simulated robotics environments including a multi-legged ant robot in a maze, and a robot arm on a cluttered tabletop, PEG exploration enables more efficient and effective training of goal-conditioned policies relative to baselines and ablations. Our ant successfully navigates a long maze, and the robot arm successfully builds a stack of three blocks upon command. Website: <a class="link-external link-https" href="https://penn-pal-lab.github.io/peg/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper attempts to address the problem of how to enable an agent to quickly learn the characteristics of an unknown environment and complete diverse tasks. Specifically, the authors propose a method called "Planning Exploratory Goals" (PEG) within the Goal-Conditioned Reinforcement Learning (GCRL) framework. The PEG method aims to maximize exploration effectiveness by optimizing the agent's goal setting during training. #### Core Issues - **Exploration Problem**: How can an agent explore its environment during training so that it can achieve various goals during testing? - **Goal Selection Problem**: How to select goals during training that can induce effective exploration? #### Main Contributions 1. **Propose a New Exploration Paradigm**: By directly optimizing goal selection to generate trajectories with high exploration value. 2. **Utilize World Models for Goal Planning**: By adapting algorithms typically used for low-level action sequence planning to effectively achieve goal-directed planning. 3. **Experimental Validation**: Validate the effectiveness of PEG in challenging simulated robotic environments, including a multi-legged ant robot in a maze and a robotic arm on a cluttered tabletop. ### Experimental Results - In various tasks, PEG performed excellently, especially in the most challenging task of stacking 3 blocks, where PEG was the only method that significantly improved the success rate. - Compared to other baseline methods (such as MEGA and Skewfit), PEG was able to achieve near-optimal behavior more quickly and performed better in more difficult environments. ### Conclusion By proposing the PEG method, the authors address issues present in existing methods, such as hard-to-reach goals and ineffective exploration paths. The PEG method can select more valuable goals during training, thereby accelerating the exploration process and improving the success rate of the final tasks.

Planning Goals for Exploration

Generalize Robot Learning from Demonstration to Variant Scenarios with Evolutionary Policy Gradient

Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

Goal-conditioned Offline Planning from Curious Exploration

Flexible and Efficient Long-Range Planning Through Curious Exploration

Learning to explore by reinforcement over high-level options

Effective State Space Exploration with Phase State Graph Generation and Goal-based Path Planning

Go-Explore: a New Approach for Hard-Exploration Problems

Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

Adaptive Multi-Goal Exploration

Autonomous Scene Exploration Using Experience Enhancement

Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

On the Complexity of Exploration in Goal-Driven Navigation

Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

Landmark Guided Active Exploration with Stable Low-level Policy Learning

Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling

Never Give Up: Learning Directed Exploration Strategies

Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

Learning Exploration Policies for Navigation