A Camera-Based End-to-End Autonomous Driving Framework Combined with Meta-Based Multi-task Optimization

Zhongyu Rao,Yingfeng Cai,Hai Wang,Long Chen,Yicheng Li,Qingchao Liu
DOI: https://doi.org/10.1109/tte.2024.3462449
IF: 6.519
2024-01-01
IEEE Transactions on Transportation Electrification
Abstract:Most existing autonomous driving pipelines can be divided  into two broad categories: those based on a modular framework, which can result in error transmission, and those based on an end-to-end framework, which lack interpretability. To overcome these challenges, we propose a novel vision-based multi-task framework that incorporates motion planning, Bird’s Eye View(BEV) map generation, BEV object prediction, depth estimation, semantic segmentation, and velocity prediction. In particular, we present an improved view transformation module that transforms feature maps into BEV space and predicts future waypoints in BEV space. The multi-task framework can improve performance by sharing information across tasks, and the results of the multi-tasks also improve interpretability. In addition, to address the negative transfer, we introduce inter-task affinity, which provides a rough estimate of the relationship between tasks. Moreover, because these relationships may change during training, we use a meta-based multi-task optimization method to dynamically adjust the multi-task weighting. We evaluate the performance of our proposed model using the Longest6 and Town05 Long benchmarks of the CARLA simulator. Our model outperforms the current state-of-the-art camera-based models and achieves competitive results with other multi-modal methods on both benchmarks. These results demonstrate the considerable potential of our proposed model for autonomous driving systems. we have also prepared an autonomous driving demonstration using the CARLA simulator which is presented at https://www.youtube.com/watch?v=ctngFH4GSBc.
What problem does this paper attempt to address?