Abstract:Semantic segmentation is a fundamental problem in multimedia which requires delicate per-pixel predictions of object categories. Recently, many researchers strive to refine the pixel-wise feature with spatial -contextual information. However, many of them still neglect the invisible hand of cross- channel information which provides inherent semantics to facilitate the segmentation performance. On the one hand, in the feature extraction stage, enhancing informative channels and suppressing trivial ones contribute to the acquisition of valuable semantic features, and thus improving the segmentation accuracy. On the other hand, in the prediction stage, we can predict the complete objects more clearly by finding the connections and complements between different channels, which can also contribute to the pixel prediction. And based on this idea, we propose a novel Channel-Adaptive Network for semantic segmentation, which is capable of enhancing the features from the perspective of channels in both feature extraction stage and prediction stage. Specifically, we propose two modules: (i) the Comprehensive Information Channel Attention (CiCA) module that addresses the shortcomings of existing channel attention by learning both low and high frequency components within each channel for emphasizing the informative channels; (ii) the Inter-Channel Relationship Reasoning (iCRR) module which is applied on the top of the feature extractor to adaptively enhance the interdependent channels by mining the complementary associations between them. Besides, our Channel-Adaptive Network is highly flexible, with a plug-and-play design. Extensive experiments have demonstrated that our method achieves the state-of-the-art segmentation performance on three challenging datasets, including Cityscapes (82.1%), ADE20K (46.51%) and PASCAL Context (55.0%).

CSANet for Video Semantic Segmentation with Inter-Frame Mutual Learning

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Attention-Guided Network for Semantic Video Segmentation

Spatial-Assistant Encoder-Decoder Network for Real Time Semantic Segmentation

Learning Cross-Channel Representations for Semantic Segmentation

DSANet: Dilated Spatial Attention for Real-Time Semantic Segmentation in Urban Street Scenes.

Cross Attention Network For Semantic Segmentation

ASFNet: Adaptive Multiscale Segmentation Fusion Network for Real‐time Semantic Segmentation

Capturing the Spatio-Temporal Continuity for Video Semantic Segmentation.

Sparse Spatial Attention Network for Semantic Segmentation

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes

MASANet: Multi-Angle Self-Attention Network for Semantic Segmentation of Remote Sensing Images

CCANet: Cross-Modality Comprehensive Feature Aggregation Network for Indoor Scene Semantic Segmentation

Stage-Aware Feature Alignment Network for Real-Time Semantic Segmentation of Street Scenes

CSANet: Cross-Semantic Attention Network for Open-Set Object Recognition

MFSA-Net: Semantic Segmentation with Camera-LiDAR Cross-Attention Fusion Based on Fast Neighbor Feature Aggregation

Adaptive multi-scale dual attention network for semantic segmentation

ACANet: Across-Scale Context Attention Network for Real-Time Semantic Segmentation

Scale-aware Attention Network for Weakly Supervised Semantic Segmentation

CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network