Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees

Binghong Chen,Bo Dai,Qinjie Lin,Guo Ye,Han Liu,Le Song
DOI: https://doi.org/10.48550/arXiv.1903.00070
2020-02-23
Abstract:We propose a meta path planning algorithm named \emph{Neural Exploration-Exploitation Trees~(NEXT)} for learning from prior experience for solving new path planning problems in high dimensional continuous state and action spaces. Compared to more classical sampling-based methods like RRT, our approach achieves much better sample efficiency in high-dimensions and can benefit from prior experience of planning in similar environments. More specifically, NEXT exploits a novel neural architecture which can learn promising search directions from problem structures. The learned prior is then integrated into a UCB-type algorithm to achieve an online balance between \emph{exploration} and \emph{exploitation} when solving a new problem. We conduct thorough experiments to show that NEXT accomplishes new planning problems with more compact search trees and significantly outperforms state-of-the-art methods on several benchmarks.
Machine Learning,Robotics
What problem does this paper attempt to address?
This paper attempts to solve the path planning problem in high - dimensional continuous state and action spaces, especially how to learn from past experiences to solve new path planning problems more efficiently. Specifically: 1. **Improving sample efficiency**: Traditional sampling - based methods (such as RRT) require a large number of samples to find a feasible solution in high - dimensional spaces, while the method proposed in this paper aims to significantly reduce the number of required samples by leveraging prior experience. 2. **Balancing exploration and exploitation**: When solving new problems, this method can automatically achieve an online balance between exploration and exploitation. This is achieved by integrating the learned neural prior into a UCB (Upper Confidence Bound) - type algorithm. 3. **Generality**: Compared with some existing learning - based planners, NEXT can handle higher - dimensional continuous state spaces, and its architecture can embed high - dimensional continuous state spaces into low - dimensional discrete spaces, and use neural planning modules on these spaces to extract planning representations. 4. **Improved search directions**: NEXT utilizes a novel attention - mechanism - based neural architecture that can learn promising search directions from the problem structure. This architecture enables NEXT to better adapt to path planning problems in different environments and benefit from previous planning experiences in similar environments. The Neural Exploration - Exploitation Trees (NEXT) algorithm proposed in the paper significantly outperforms existing methods on several benchmark tasks by combining the above features, showing a higher success rate and better solution quality.