EFDCNet: Encoding Fusion and Decoding Correction Network for RGB-D Indoor Semantic Segmentation
Jianlin Chen,Gongyang Li,Zhijiang Zhang,Dan Zeng
DOI: https://doi.org/10.1016/j.imavis.2023.104892
IF: 3.86
2024-01-01
Image and Vision Computing
Abstract:Semantic segmentation is a crucial task in vision measurement systems that involves understanding and segmenting different objects and regions within an image. Over the years, numerous RGB-D semantic segmentation methods have been developed, leveraging the encoder-decoder architecture to achieve outstanding performance. However, existing methods have two main problems that constrain further performance improvement. Firstly, in the encoding stage, existing methods have a weak ability to fuse cross-modal information, and low-quality depth maps can easily lead to poor feature representation. Secondly, in the decoding stage, the upsampling of high-level semantic information may cause the loss of contextual information, and low-level features from the encoder may bring noises to the decoder through skip connections. To solve these issues, we propose a novel Encoding Fusion and Decoding Correction Network (EFDCNet) for RGB-D indoor semantic segmentation. First, in the encoding stage of EFDCNet, we focus on extracting valuable information from low-quality depth maps, and employ a channel-wise filter to select informative depth features. Additionally, we establish the global dependencies between RGB and depth features via the self-attention mechanism to enhance the cross-modal feature interactions, extracting discriminant and powerful features. Then, in the decoding stage of EFDCNet, we use the highest-level information as semantic guidance to compensate for the upsampling information and filter out noise from the low-level encoder features propagated through the skip connections to the decoder. Extensive experiments conducted on two widely-used RGB-D indoor semantic segmentation datasets demonstrate that the proposed EFDCNet surpasses the performance of relevant state-of-the-art methods. The code is available at https://github.com/ Mark9010/EFDCNet