Abstract:The well-established modular autonomous driving system is decoupled into different standalone tasks, e.g. perception, prediction and planning, suffering from information loss and error accumulation across modules. In contrast, end-to-end paradigms unify multi-tasks into a fully differentiable framework, allowing for optimization in a planning-oriented spirit. Despite the great potential of end-to-end paradigms, both the performance and efficiency of existing methods are not satisfactory, particularly in terms of planning safety. We attribute this to the computationally expensive BEV (bird's eye view) features and the straightforward design for prediction and planning. To this end, we explore the sparse representation and review the task design for end-to-end autonomous driving, proposing a new paradigm named SparseDrive. Concretely, SparseDrive consists of a symmetric sparse perception module and a parallel motion planner. The sparse perception module unifies detection, tracking and online mapping with a symmetric model architecture, learning a fully sparse representation of the driving scene. For motion prediction and planning, we review the great similarity between these two tasks, leading to a parallel design for motion planner. Based on this parallel design, which models planning as a multi-modal problem, we propose a hierarchical planning selection strategy , which incorporates a collision-aware rescore module, to select a rational and safe trajectory as the final planning output. With such effective designs, SparseDrive surpasses previous state-of-the-arts by a large margin in performance of all tasks, while achieving much higher training and inference efficiency. Code will be avaliable at <a class="link-external link-https" href="https://github.com/swc-17/SparseDrive" rel="external noopener nofollow">this https URL</a> for facilitating future research.

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Learning Accurate, Comfortable and Human-like Driving

Humanlike Driving: Empirical Decision-Making System for Autonomous Vehicles

DeepGoal: Learning to drive with driving intention from human control demonstration

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

On-Board Vision-Language Models for Personalized Autonomous Vehicle Motion Control: System Design and Real-World Validation

A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving

Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

VTGNet: A Vision-based Trajectory Generation Network for Autonomous Vehicles in Urban Environments

Hierarchical Learned Risk-Aware Planning Framework for Human Driving Modeling

End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners

VLM-Auto: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving

VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision