Abstract:The complementary properties exhibited upon RGB-T data involve context complementarity as well as content complementarity. During cross-modal feature fusion, most existing RGB-T semantic segmentation methods are dedicated to highlighting the exploitation of content-complementary information. Unfortunately, these methods usually overlook the excavation of cross-modal context-complementary information ( i.e ., the contextual dependencies among different regions that only exist in one certain modality data) or try to exploit such cross-modal context-complementary information in an implicit way, yielding fragmentary semantic segmentation results. To remedy this problem, in this paper, a novel Cross-modal Context- and Content-Complementarity Network (C 4 Net) is presented for RGB-T semantic segmentation, in which both the cross-modal context-complementary information and the cross-modal content-complementary information are fully excavated and exploited during cross-modal feature fusion. Specifically, a Context-Complementary Information Aggregation (CxCIA) module is carefully designed, in which the cross-modal context-complementary information is explicitly excavated by measuring the discrepancies between contextual dependencies from different modality data. Then, such cross-modal context-complementary information is further exploited to enhance the original RGB and thermal contextual dependencies for boosting the integrity of objects in the fused features. In the meantime, a Content-Complementary Information Aggregation (CnCIA) module is presented, which highlights the utilization of cross-modal content-complementary information from a multi-scale perspective. Furthermore, an MLP-based Multi-level Feature Interaction (MFI) decoder is presented, in which the semantic gaps among different levels of fused features are mitigated by establishing the interactions of multi-level fused features along spatial and channel dimensions. Comprehensive experimental results on several public datasets demonstrate that our proposed C 4 Net surpasses other state-of-the-art models.

Cascading Context Enhancement Network for RGB-D Semantic Segmentation

ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation.

HCNet: Hierarchical Context Network for Semantic Segmentation

Muti-Scale Context-Aware Network for Cross-Domain Unsupervised Remote Sensing Image Semantic Segmentation

Context-Aware Interaction Network for RGB-T Semantic Segmentation

C4Net: Excavating Cross-modal Context- and Content-Complementarity for RGB-T Semantic Segmentation

CCENet: Cascade Class-Aware Enhanced Network for High-Resolution Aerial Imagery Semantic Segmentation.

Context Union Edge Network for Semantic Segmentation of Small-Scale Objects in Very High Resolution Remote Sensing Images

Attention-guided chained context aggregation for semantic segmentation

Context Aggregation Network for Remote Sensing Image Semantic Segmentation

Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation

Attention-based Dual Context Aggregation for Image Semantic Segmentation

DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation

CTNet: Context-based Tandem Network for Semantic Segmentation

DCANet: Dense Context-Aware Network for Semantic Segmentation

Context Encoding for Semantic Segmentation

Deep Feature Filtering and Contextual Information Gathering Network for RGB-D Salient Object Detection

Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images

Context Enhancing Representation for Semantic Segmentation in Remote Sensing Images.

OCNet: Object Context Network for Scene Parsing