Abstract:In order to address the issue of environmental perception in autonomous driving on structured roads, we propose MultiNet-GS, a convolutional neural network model based on an encoder-decoder architecture that tackles multiple tasks simultaneously. We use the main structure of the latest object detection model, the YOLOv8 model, as the encoder structure of our model. We introduce a new dynamic sparse attention mechanism, BiFormer, in the feature extraction part of the model to achieve more flexible computing resource allocation, which can significantly improve the computational efficiency and occupy a small computational overhead. We introduce a lightweight convolution, GSConv, in the feature fusion part of the network, which is used to build the neck part into a new slim-neck structure so as to reduce the computational complexity and inference time of the detector. We also add an additional detector for tiny objects to the conventional three-head detector structure. Finally, we introduce a lane detection method based on guide lines in the lane detection part, which can aggregate the lane feature information into multiple key points, obtain the lane heat map response through conditional convolution, and then describe the lane line through the adaptive decoder, which effectively makes up for the shortcomings of the traditional lane detection method. Our comparative experiments on the BDD100K dataset on the embedded platform NVIDIA Jetson TX2 show that compared with SOTA(YOLOPv2), the mAP@0.5 of the model in traffic object detection reaches 82.1%, which is increased by 2.7%. The accuracy of the model in drivable area detection reaches 93.2%, which is increased by 0.5%. The accuracy of the model in lane detection reaches 85.7%, which is increased by 4.3%. The Params and FLOPs of the model reach 47.5 M and 117.5, which are reduced by 6.6 M and 8.3, respectively. The model achieves 72 FPS, which is increased by 5. Our MultiNet-GS model has the highest detection accuracy among the current mainstream models while maintaining a good detection speed and has certain superiority.

AppNets: an Efficient Multi-Task Fusion Network for Comprehensive Driving Perception

A Fusion Method Aiming at Environmental Perception of Autonomous Vehicle Based on Visual Scheme

Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception

HybridNets: End-to-End Perception Network

CenterPNets: A Multi-Task Shared Network for Traffic Perception

A Multi-Task Network Based on Dual-Neck Structure for Autonomous Driving Perception

Deep Learning-Enhanced Environment Perception for Autonomous Driving: MDNet with CSP-DarkNet53

ShuDA-RFBNet for Real-time Multi-task Traffic Scene Perception

Cutransnet: Transformers to Make Strong Encoders for Multi-Task Vision Perception of Autonomous Driving

Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation

Mobip: a lightweight model for driving perception using MobileNet

Real-Time Monocular Joint Perception Network for Autonomous Driving

AdvNet: Multi-Task Fusion of Object Detection and Semantic Segmentation

MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network

DRMNet: A Multi-Task Detection Model Based on Image Processing for Autonomous Driving Scenarios.

DSNet for Real-Time Driving Scene Semantic Segmentation

A panoramic driving perception fusion algorithm based on multi-task learning

Research on Multi-Task Perception Network of Traffic Scene Based on Feature Fusion1

YOLOMH: you only look once for multi-task driving perception with high efficiency

ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar

PVI-Net: Point-Voxel-Image Fusion for Semantic Segmentation of Point Clouds in Large-Scale Autonomous Driving Scenarios