Abstract:Visual perception is a crucial component of autonomous driving systems. Traditional approaches for autonomous driving visual perception often rely on single-modal methods, and semantic segmentation tasks are accomplished by inputting RGB images. However, for semantic segmentation tasks in autonomous driving visual perception, a more effective strategy involves leveraging multiple modalities, which is because different sensors of the autonomous driving system bring diverse information, and the complementary features among different modalities enhance the robustness of the semantic segmentation modal. Contrary to the intuitive belief that more modalities lead to better accuracy, our research reveals that adding modalities to traditional semantic segmentation models can sometimes decrease precision. Inspired by the residual thinking concept, we propose a multimodal visual perception model which is capable of maintaining or even improving accuracy with the addition of any modality. Our approach is straightforward, using RGB as the main branch and employing the same feature extraction backbone for other modal branches. The modals score module (MSM) evaluates channel and spatial scores of all modality features, measuring their importance for overall semantic segmentation. Subsequently, the modal branches provide additional features to the RGB main branch through the features complementary module (FCM). Leveraging the residual thinking concept further enhances the feature extraction capabilities of all the branches. Through extensive experiments, we derived several conclusions. The integration of certain modalities into traditional semantic segmentation models tends to result in a decline in segmentation accuracy. In contrast, our proposed simple and scalable multimodal model demonstrates the ability to maintain segmentation precision when accommodating any additional modality. Moreover, our approach surpasses some state-of-the-art multimodal semantic segmentation models. Additionally, we conducted ablation experiments on the proposed model, confirming that the application of the proposed MSM, FCM, and the incorporation of residual thinking contribute significantly to the enhancement of the model.

A Joint Object Detection and Semantic Segmentation Model with Cross-Attention and Inner-Attention Mechanisms

Research on multitask model of object detection and road segmentation in unstructured road scenes

AVFP-MVX: Multimodal VoxelNet with Attention Mechanism and Voxel Feature Pyramid

Attention-Mechanism-based Tracking Method for Intelligent Internet of Vehicles

Simple Scalable Multimodal Semantic Segmentation Model

Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving

Adaptive multi-scale dual attention network for semantic segmentation

Semantic segmentation of autonomous driving scenes based on multi-scale adaptive attention mechanism

Joint Semantic Understanding with a Multilevel Branch for Driving Perception

Cross-Domain Car Detection Model with Integrated Convolutional Block Attention Mechanism

Vehicle Instance Segmentation From Aerial Image and Video Using a Multitask Learning Residual Fully Convolutional Network

ABSSNet: Attention-Based Spatial Segmentation Network for Traffic Scene Understanding

Multi-Task Deep Learning Model for Autonomous Driving: Object Detection, Semantic Segmentation, and Depth Estimation

RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving

Efficient Automatic Driving Instance Segmentation Method Based on Detection

Convolutional Neural Networks-Based Object Detection Algorithm by Jointing Semantic Segmentation for Images

Object Segmentation by Mining Cross-Modal Semantics

A Deeply Supervised Semantic Segmentation Method Based on GAN

Joint Semantic-Instance Segmentation Method for Intelligent Transportation System

A Novel Lane Line Detection Algorithm for Driverless Geographic Information Perception Using Mixed-Attention Mechanism ResNet and Row Anchor Classification

Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving