Abstract:A vision-based autonomous driving perception system necessitates the accomplishment of a suite of tasks, including vehicle detection, drivable area segmentation, and lane line segmentation. In light of the limited computational resources available, multi-task learning has emerged as the preeminent methodology for crafting such systems. In this article, we introduce a highly efficient end-to-end multi-task learning model that showcases promising performance on all fronts. Our approach entails the development of a reliable feature extraction network by introducing a feature extraction module called C2SPD. Moreover, to account for the disparities among various tasks, we propose a dual-neck architecture. Finally, we present an optimized design for the decoders of each task. Our model evinces strong performance on the demanding BDD100K dataset, attaining remarkable accuracy (Acc) in vehicle detection and superior precision in drivable area segmentation (mIoU). In addition, this is the first work that can process these three visual perception tasks simultaneously in real time on an embedded device Atlas 200I A2 and maintain excellent accuracy.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to design an efficient, real - time and accurate multi - task learning network for the three key tasks of vehicle detection, drivable area segmentation and lane line segmentation in the autonomous driving perception system under the condition of limited resources. Specifically, the paper proposes a multi - task network YOLOP - DN based on the dual - neck structure, aiming to overcome the deficiencies of existing methods in detection accuracy and segmentation accuracy while maintaining low computational resource consumption and high real - time performance. By introducing the efficient feature extraction module C2SPD and the optimized decoder design, this network can achieve real - time processing on embedded devices and has achieved significant performance improvement on the BDD100K dataset. ### Main contributions: 1. **Proposed a new feature extraction backbone network**: By introducing the C2SPD module, the feature extraction ability is improved while the number of parameters and the inference speed are controlled. 2. **Designed a dual - neck structure**: This structure can meet the different feature requirements of detection and segmentation tasks respectively, avoiding the problem of insufficient information complementarity caused by a single - neck structure. 3. **Developed an end - to - end multi - task learning network YOLOP - DN**: This network performs well on the BDD100K dataset and has been successfully deployed on the Atlas 200I A2 embedded device to achieve real - time inference. ### Experimental results: - **Vehicle detection**: YOLOP - DN reaches 78.1% on the mAP50 index, which is 1.6% higher than the baseline model; the recall rate reaches 90.5%, which is 1.3% higher than the baseline model. - **Drivable area segmentation**: YOLOP - DN reaches 92.0% on the mIoU index, which is 0.5% higher than the baseline model. - **Lane line segmentation**: YOLOP - DN reaches 73.8% and 27.3% on the accuracy and IoU indexes respectively, which are 3.3% and 1.1% higher than the baseline model respectively. ### Model parameters and inference speed: - The number of parameters of YOLOP - DN is 10.9M, and the inference speed is 91fps. Compared with the baseline model YOLOP (7.9M parameters, 125fps) and the state - of - the - art YOLOPv2 (38.9M parameters, 168fps), YOLOP - DN has achieved a good balance between the number of parameters and the inference speed. ### Conclusion: Through the above improvements, YOLOP - DN has excellent performance in network performance, computational resource consumption and real - time performance, providing strong support for the practical application of the autonomous driving perception system.

A Multi-Task Network Based on Dual-Neck Structure for Autonomous Driving Perception

A Multi-Task Road Feature Extraction Network with Grouped Convolution and Attention Mechanisms

LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving

YOLOMH: you only look once for multi-task driving perception with high efficiency

HybridNets: End-to-End Perception Network

SDAPNet: End-to-End Multi-task Simultaneous Detection and Prediction Network.

Multi-Task Learning in Autonomous Driving Scenarios Via Adaptive Feature Refinement Networks

Multi-Task Environmental Perception Methods for Autonomous Driving

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

Joint Semantic Understanding with a Multilevel Branch for Driving Perception

Real-Time Monocular Joint Perception Network for Autonomous Driving

Enhanced encoder–decoder architecture for visual perception multitasking of autonomous driving

CenterPNets: A Multi-Task Shared Network for Traffic Perception

Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator

NeurAll: Towards a Unified Visual Perception Model for Automated Driving

Mobip: a lightweight model for driving perception using MobileNet

Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation

Multi-Task Visual Perception for Object Detection and Semantic Segmentation in Intelligent Driving

V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric Heterogenous Distillation Network

Multi-Task Deep Learning Model for Autonomous Driving: Object Detection, Semantic Segmentation, and Depth Estimation

Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring