Multispectral Fusion Transformer Network for RGB-Thermal Urban Scene Semantic Segmentation

Heng Zhou,Chunna Tian,Zhenxi Zhang,Qizheng Huo,Yongqiang Xie,Zhongbo Li
DOI: https://doi.org/10.1109/lgrs.2022.3179721
IF: 5.343
2022-01-01
IEEE Geoscience and Remote Sensing Letters
Abstract:Semantic segmentation plays a vital role in autonomous vehicles. Fusing the rich details of RGB image and the illumination robustness of thermal image has great potential to improve the performance of RGB-T semantic segmentation. In multispectral feature fusion, the current main methods are less effective in the characterization of correlations and complementarities of RGB-T. In order to generate robust cross-spectral fusion features, we propose a multispectral fusion transformer network (MFTNet). Specifically, we first design an MFT module to handle the intraspectra correlation and the interspectra complementarity of RGB-T in the multispectral fusion encoder. MFT effectively enhances the RGB-T feature representation under various challenges. Then, an optimization strategy with progressive deep supervision (PDS) loss is proposed to directly supervise the upper and lower layers of the decoder. This strategy can guide the decoder to achieve precise segmentation in a coarse-to-fine manner. Finally, plenty of experimental results prove the effectiveness of our method. On the MFNet dataset, MFNet achieved 74.7 mAcc and 57.3 mIoU, outperforming the state-of-the-art methods.
What problem does this paper attempt to address?