Complementarity-aware cross-modal feature fusion network for RGB-T semantic segmentation

Wei Wu,Tao Chu,Qiong Liu
DOI: https://doi.org/10.1016/j.patcog.2022.108881
IF: 8
2022-07-14
Pattern Recognition
Abstract:RGB-T semantic segmentation has attracted growing attention because it makes a model robust towards challenging illumination. Most existing methods fuse RGB and thermal information in an equal manner along spatial dimensions, which results in feature redundancy and affects the discriminability of cross-modal features. In this paper, we propose a Complementarity-aware Cross-modal Feature Fusion Network (CCFFNet) including a Complementarity-Aware Encoder (CAE) and a Three-Path Fusion and Supervision (TPFS). The CAE, which consists of cascaded cross-modal fusion modules, can select complementary information from RGB and thermal features via a novel gate and fuse them by a channel-wise weighting mechanism. TPFS not only iteratively performs Three-Path Fusion (TPF) to further enhance cross-modal features, but also supervise the training of CCFFNet along three branches by Three-Supervision (TS). Extensive experiments are carried out and the results demonstrate that our model outperforms the state-of-the-art models by at least 1.6% mIoU on MFNet dataset and 2.9% mIoU on PST900 dataset, respectively. And a single-modality-based model can be easily applied to multi-modal semantic segmentation when plugging our CAE.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?