Abstract:The end-to-end neural network has become a hot topic in recent years. Compared with traditional module-based solutions, the end-to-end paradigm is able to reduce the accumulated error and avoid information loss, so that it earns great attention in autonomous driving tasks. However, the current end-to-end network designs easily lose useful information during training due to the complexity of mapping high-dimensional visual observation to navigation waypoints. Since the future navigation point is reasoned from the former one, the planning task is like a sequence generation task. Inspired by the great power of the neural language model, we propose an end-to-end framework, which transfers the planning task as a language sequence generation task conditioned on pixel inputs. The proposed method firstly extracts and transforms the image feature from camera-view to bird-eye-view (BEV). Then the target navigation point is constructed into a text sequence, as the prompt of the visual-language transformer. Finally, the auto-regressive transformer decoder receives the BEV feature and the text sequences to generate sequential waypoints. Overall, our proposed method can make full use of the environmental information and express the planning trajectory as a language sequence to learn the correspondence between trajectory sequences and images. We have conducted extensive experiments on CARLA benchmarks and our model achieves state-of-the-art performance compared with other visual methods.

Regression Planning Networks

FLTRNN: Faithful Long-Horizon Task Planning for Robotics with Large Language Models

Sampling from Pre-Images to Learn Heuristic Functions for Classical Planning

Universal Planning Networks

Visual Robot Task Planning

What Planning Problems Can A Relational Neural Network Solve?

Network planning with deep reinforcement learning

Planning in a recurrent neural network that plays Sokoban

NSP: A Neuro-Symbolic Natural Language Navigational Planner

Neural-Guided RuntimePrediction of Planners for Improved Motion and Task Planning with Graph Neural Networks

Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition

Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning

A Deep Q Network for Robotic Planning from Image

Pix2Planning: End-to-End Planning by Vision-language Model for Autonomous Driving on Carla Simulator

HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks

Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees

Neural-Network-Driven Method for Optimal Path Planning via High-Accuracy Region Prediction

NRTIRL Based NN-RRT* Path Planner in Human-Robot Interaction Environment

Neural Informed RRT*: Learning-based Path Planning with Point Cloud State Representations under Admissible Ellipsoidal Constraints

Learning to Imagine Manipulation Goals for Robot Task Planning

Learning Visual Planning Models from Partially Observed Images