Rgb-t semantic segmentation based on cross-operational fusion attention in autonomous driving scenario

Jiyou Zhang,Rongfen Zhang,Wenhao Yuan,Yuhong Liu
DOI: https://doi.org/10.1007/s12530-024-09567-8
IF: 2.347
2024-02-24
Evolving Systems
Abstract:In recent years, semantic segmentation has become a key technology for autonomous driving. However, owing to the complexity of the driving environment, many semantic segmentation networks trained with RGB images do not perform well under extreme circumstances such as smoke, strong light, and low visibility. To solve the problem above, this article chooses ResNet-152 as the backbone and constructs a novel cross-operational fusion attention module into the dual encoder-decoder model to fully integrate the features from two modal inputs of RGB and thermal infrared images. First, the dual-encoder model extracts feature information from RGB and thermal infrared modal images independently. Meanwhile, the feature fusion is implemented through the proposed cross-operational fusion attention mechanism to reduce modal conflicts and focus on more details in the encoder stage. In the decoder, to further improve segmentation performance, the up-sampling feature map and the feature map of each layer of the encoder are concatenated in stages. Then, feature extraction is performed through the convolutional layer. After five times up-sampling, the image resolution is gradually restored and the semantic segmentation is carried out at the end. The experimental results show that the mean accuracy rate and mean intersection and union on the MFNet dataset are 69.3% and 56.2%, respectively. And the inference rate is about 33 images per second when it was tested on an NVIDIA GeForce 3090Ti. It meets the basic requirements of semantic segmentation for autonomous driving in extreme environments.
computer science, artificial intelligence
What problem does this paper attempt to address?