Planning Goals for Exploration

Edward S. Hu,Richard Chang,Oleh Rybkin,Dinesh Jayaraman
2023-03-23
Abstract:Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. PEG first chooses goal commands such that the agent's goal-conditioned policy, at its current level of training, will end up in states with high exploration potential. It then launches an exploration policy starting at those promising states. To enable this direct optimization, PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands". In challenging simulated robotics environments including a multi-legged ant robot in a maze, and a robot arm on a cluttered tabletop, PEG exploration enables more efficient and effective training of goal-conditioned policies relative to baselines and ablations. Our ant successfully navigates a long maze, and the robot arm successfully builds a stack of three blocks upon command. Website: <a class="link-external link-https" href="https://penn-pal-lab.github.io/peg/" rel="external noopener nofollow">this https URL</a>
Machine Learning,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper attempts to address the problem of how to enable an agent to quickly learn the characteristics of an unknown environment and complete diverse tasks. Specifically, the authors propose a method called "Planning Exploratory Goals" (PEG) within the Goal-Conditioned Reinforcement Learning (GCRL) framework. The PEG method aims to maximize exploration effectiveness by optimizing the agent's goal setting during training. #### Core Issues - **Exploration Problem**: How can an agent explore its environment during training so that it can achieve various goals during testing? - **Goal Selection Problem**: How to select goals during training that can induce effective exploration? #### Main Contributions 1. **Propose a New Exploration Paradigm**: By directly optimizing goal selection to generate trajectories with high exploration value. 2. **Utilize World Models for Goal Planning**: By adapting algorithms typically used for low-level action sequence planning to effectively achieve goal-directed planning. 3. **Experimental Validation**: Validate the effectiveness of PEG in challenging simulated robotic environments, including a multi-legged ant robot in a maze and a robotic arm on a cluttered tabletop. ### Experimental Results - In various tasks, PEG performed excellently, especially in the most challenging task of stacking 3 blocks, where PEG was the only method that significantly improved the success rate. - Compared to other baseline methods (such as MEGA and Skewfit), PEG was able to achieve near-optimal behavior more quickly and performed better in more difficult environments. ### Conclusion By proposing the PEG method, the authors address issues present in existing methods, such as hard-to-reach goals and ineffective exploration paths. The PEG method can select more valuable goals during training, thereby accelerating the exploration process and improving the success rate of the final tasks.