Multimodal Image Semantic Segmentation Based on Attention Mechanism

Zhang Ji-you,Zhang Rong-fen,Liu Yu-hong,Yuan Wen-hao
DOI: https://doi.org/10.37188/cjlcd.2022-0309
2023-01-01
Chinese Journal of Liquid Crystals and Displays
Abstract:The training data of many current semantic segmentation models are RGB images,and the stability of the model is easily affected in some extreme environments. It cannot meet the actual demand of automatic driving at night. ResNet-152 is used as a feature extraction network to construct a multi-modal dual encoder-decoder model integrating lightweight attention module. The dual encoder extracts key information from the two modes of RGB-T and fuses it through the attention module. Then,the extracted feature information is sent to the decoder. The upsampled feature map and the feature map extracted by the encoder of each layer are spliced in stages,the feature is extracted by the convolution layer,the resolution is restored by upsampling, and the semantic segmentation is carried out at the last. The experimental results show that the mean accuracy and mean intersection over union of the proposed model on the MFNet test set are 76% and 55. 7%,respectively,which makes a certain improvement compared with other network models. This model can basically achieve the requirement of accurate semantic segmentation of RGB-T modal images both day and night
What problem does this paper attempt to address?