CDMANet: central difference mutual attention network for RGB-D semantic segmentation

Mengjiao Ge,Wen Su,Jinfeng Gao,Guoqiang Jia
DOI: https://doi.org/10.1007/s11227-024-06760-z
IF: 3.3
2024-12-06
The Journal of Supercomputing
Abstract:RGB-D semantic segmentation utilizes both images and depth maps to classify pixels into different semantic classes. Currently, most methods rely on scale interaction within a single modality or the late fusion of dual-modal information, with little exploration of the correlation between two modalities at the same scale. There is also limited research on cross-scale and cross-modal feature fusion. This paper introduces a central difference mutual attention network for semantic segmentation based on dual-modal information, utilizing a parallel Transformer encoder to extract multi-level features from images and depth maps. To address issues such as blurring of boundaries in global information interaction and effectively provide spatial and semantic information interaction, a central difference mutual attention module is proposed. Finally, the composite cross decoder is employed to capture diverse feature levels while minimizing information loss. Experimental results demonstrate that our method outperforms several typical and state-of-the-art segmentation models on two challenging public benchmark datasets, achieving a mIoU of 51.9 and 79.4 on the NYU-Dv2 and Cityscapes datasets, respectively, thus affirming its efficacy in balancing segmentation accuracy. Code is available at https://github.com/eG-Sophia/CDMANet.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?