MS-IRTNet: Multistage Information Interaction Network for RGB-T Semantic Segmentation

Zhiwei Zhang,Yisha Liu,Weimin Xue
DOI: https://doi.org/10.1016/j.ins.2023.119442
IF: 8.1
2023-01-01
Information Sciences
Abstract:The complementary information from RGB and thermal images can remarkably boost semantic segmentation performance. Existing RGB-T segmentation methods usually use simple interaction strategies to extract complementary information from RGB and thermal images, which ignores recognizability features from different imaging mechanisms. To address these problems, we propose a multistage information interaction network for RGB-T semantic segmentation called MS-IRTNet. MS-IRTNet has a dual-stream encoder structure that can extract multistage feature information. To better interact with multimodal information, we design a gate-weighted interaction module (GWIM) and a feature information interaction module (FIIM). GWIM can learn multimodal information weights in different channels, while FIIM integrates and fuses weighted RGB and thermal information into a single feature map. Finally, multistage interactive information is fed into the decoder for semantic prediction. Our method achieves 60.5 mIoU on the MFNet dataset, outperforming state-of-the-art methods. Notably, MS-IRTNet also achieved state-of-the-art results in tests of daytime images (51.7 mIoU) and nighttime images (62.5 mIoU). The code and pre-trained models are available at https://github.com/poisonzzw/MS-IRTNet.
What problem does this paper attempt to address?