Learning Cross-Channel Representations for Semantic Segmentation
Lingfeng Ma,Hongtao Xie,Chuanbin Liu,Yongdong Zhang
DOI: https://doi.org/10.1109/tmm.2022.3151145
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Semantic segmentation is a fundamental problem in multimedia which requires delicate per-pixel predictions of object categories. Recently, many researchers strive to refine the pixel-wise feature with spatial -contextual information. However, many of them still neglect the invisible hand of cross- channel information which provides inherent semantics to facilitate the segmentation performance. On the one hand, in the feature extraction stage, enhancing informative channels and suppressing trivial ones contribute to the acquisition of valuable semantic features, and thus improving the segmentation accuracy. On the other hand, in the prediction stage, we can predict the complete objects more clearly by finding the connections and complements between different channels, which can also contribute to the pixel prediction. And based on this idea, we propose a novel Channel-Adaptive Network for semantic segmentation, which is capable of enhancing the features from the perspective of channels in both feature extraction stage and prediction stage. Specifically, we propose two modules: (i) the Comprehensive Information Channel Attention (CiCA) module that addresses the shortcomings of existing channel attention by learning both low and high frequency components within each channel for emphasizing the informative channels; (ii) the Inter-Channel Relationship Reasoning (iCRR) module which is applied on the top of the feature extractor to adaptively enhance the interdependent channels by mining the complementary associations between them. Besides, our Channel-Adaptive Network is highly flexible, with a plug-and-play design. Extensive experiments have demonstrated that our method achieves the state-of-the-art segmentation performance on three challenging datasets, including Cityscapes (82.1%), ADE20K (46.51%) and PASCAL Context (55.0%).