Abstract:End-to-end autonomous driving aims to build a fully differentiable system that takes raw sensor data as inputs and directly outputs the planned trajectory or control signals of the ego vehicle. State-of-the-art methods usually follow the `Teacher-Student' paradigm. The Teacher model uses privileged information (ground-truth states of surrounding agents and map elements) to learn the driving strategy. The student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model. By eliminating the noise of the perception part during planning learning, state-of-the-art works could achieve better performance with significantly less data compared to those coupled ones. However, under the current Teacher-Student paradigm, the student model still needs to learn a planning head from scratch, which could be challenging due to the redundant and noisy nature of raw sensor inputs and the casual confusion issue of behavior cloning. In this work, we aim to explore the possibility of directly adopting the strong teacher model to conduct planning while letting the student model focus more on the perception part. We find that even equipped with a SOTA perception model, directly letting the student model learn the required inputs of the teacher model leads to poor driving performance, which comes from the large distribution gap between predicted privileged inputs and the ground-truth. To this end, we propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules. Additionally, since the pure learning-based teacher model itself is imperfect and occasionally breaks safety rules, we propose a method of action-guided feature learning with a mask for those imperfect teacher features to further inject the priors of hand-crafted rules into the learning process.

What problem does this paper attempt to address?

The paper aims to address a key challenge in end-to-end autonomous driving systems: how to effectively separate the perception and planning modules to overcome the causal confusion problem brought by behavior cloning and to enhance the overall performance and efficiency of the system. Specifically, the paper proposes a new paradigm called DriveAdapter, which establishes an adapter module between the teacher model (a reinforcement learning model trained with privileged information) and the student model (a model that only has access to raw sensor data). This adapter module allows the student model to focus on perception learning while utilizing the frozen teacher model for planning, thus avoiding the issue in the traditional "teacher-student" paradigm where the student model needs to learn the planning strategy from scratch. The paper points out that under the existing "teacher-student" paradigm, although efficient learning can be achieved by providing privileged inputs (such as the real states of surrounding agents), the student model still needs to learn the planning strategy through behavior cloning, which may lead to causal confusion, especially when dealing with redundant and noisy raw sensor inputs. To address this problem, DriveAdapter introduces a feature alignment objective function and an action-guided feature learning method to minimize the distribution gap between the predicted privileged inputs and the real inputs, and to ensure that the behaviors generated by the model comply with predefined safety rules. Furthermore, the paper discusses how to handle the issue of imperfect teacher models and proposes a masked feature distillation strategy. When the teacher model is overridden by rules, this strategy prevents the propagation of feature alignment loss and supervises directly through action loss, ensuring that the model learns features that can produce good actions, rather than merely imitating the teacher model. Overall, DriveAdapter achieves complete decoupling of perception and planning by inserting an adapter module between the student and teacher models, avoiding the drawbacks of behavior cloning and improving the performance and robustness of autonomous driving systems.

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

A Shared Control Approach for Autonomous Vehicles via Driver Behaviors Learning

Planning-oriented Autonomous Driving

SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

End-to-End Autonomous Driving: An Angle Branched Network Approach

End-to-End Autonomous Driving With Semantic Depth Cloud Mapping and Multi-Agent

ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving

Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Integrating Decision-Making Into Differentiable Optimization Guided Learning for End-to-End Planning of Autonomous Vehicles

End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

Exploring the Causality of End-to-End Autonomous Driving

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners