Abstract:End-to-end autonomous driving provides a simple and efficient framework for autonomous driving systems, which can directly obtain control commands from raw perception data. However, it fails to address stability and interpretability problems in complex urban scenarios. In this paper, we construct a two-stage end-to-end autonomous driving model for complex urban scenarios, named HIIL (Hierarchical Interpretable Imitation Learning), which integrates interpretable BEV mask and steering angle to solve the problems shown above. In Stage One, we propose a pretrained Bird's Eye View (BEV) model which leverages a BEV mask to present an interpretation of the surrounding environment. In Stage Two, we construct an Interpretable Imitation Learning (IIL) model that fuses BEV latent feature from Stage One with an additional steering angle from Pure-Pursuit (PP) algorithm. In the HIIL model, visual information is converted to semantic images by the semantic segmentation network, and the semantic images are encoded to extract the BEV latent feature, which are decoded to predict BEV masks and fed to the IIL as perception data. In this way, the BEV latent feature bridges the BEV and IIL models. Visual information can be supplemented by the calculated steering angle for PP algorithm, speed vector, and location information, thus it could have better performance in complex and terrible scenarios. Our HIIL model meets an urgent requirement for interpretability and robustness of autonomous driving. We validate the proposed model in the CARLA simulator with extensive experiments which show remarkable interpretability, generalization, and robustness capability in unknown scenarios for navigation tasks.

End-to-end Driving via Conditional Imitation Learning

Imitation Learning of Hierarchical Driving Model: from Continuous Intention to Continuous Trajectory

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

End-to-End Steering for Autonomous Vehicles via Conditional Imitation Co-Learning

Conditional Affordance Learning for Driving in Urban Environments

Conditional Driving from Natural Language Instructions

DeepGoal: Learning to drive with driving intention from human control demonstration

End-to-end Driving Deploying Through Uncertainty-Aware Imitation Learning and Stochastic Visual Domain Adaptation.

Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

CCIL: Context-conditioned imitation learning for urban driving

SAM: Squeeze-and-Mimic Networks for Conditional Visual Driving Policy Learning

Autonomous Vehicle Control: End-to-end Learning in Simulated Urban Environments

End-to-End Driving via Self-Supervised Imitation Learning Using Camera and LiDAR Data

Autonomous driving in traffic with end-to-end vision-based deep learning

Dynamic Conditional Imitation Learning for Autonomous Driving

Addressing Limitations of State-Aware Imitation Learning for Autonomous Driving

Deep Imitation Learning for Autonomous Driving in Generic Urban Scenarios with Enhanced Safety

Enhancing scene understanding based on deep learning for end-to-end autonomous driving

Hierarchical Interpretable Imitation Learning for End-to-End Autonomous Driving

Safe Imitation Learning on Real-Life Highway Data for Human-like Autonomous Driving

End-to-end driving simulation via angle branched network