Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection
Fushuo Huo,Xuegui Zhu,Lei Zhang,Qifeng Liu,Yu Shu
DOI: https://doi.org/10.1109/tcsvt.2021.3102268
IF: 5.859
2021-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:RGB-T salient object detection (SOD) aims at utilizing the complementary cues of RGB and Thermal (T) modalities to detect and segment the common objects. However, on one hand, existing methods simply fuse the features of two modalities without fully considering the characters of RGB and T. On the other hand, the high computational cost of existing methods prevents them from real-world applications (e.g., automatic driving, abnormal detection, person re-ID). To this end, we proposed an efficient encoder-decoder network named Context-guided Stacked Refinement Network (CSRNet). Specifically, we utilize a lightweight backbone and design efficient decoder parts, which greatly reduce the computational cost. To fuse RGB and T modalities, we proposed an efficient Context-guided Cross Modality Fusion (CCMF) module to filter the noise and explore the complementation of two modalities. Besides, Stacked Refinement Network (SRN) progressively refines the features from top to down via the interaction of semantic and spatial information. Extensive experiments show that our method performs favorably against state-of-the-art algorithms on RGB-T SOD task while with small model size (4.6M), few FLOPs (4.2G), and real-time speed (38 fps). Our codes is available at: https://github.com/huofushuo/CSRNet.
engineering, electrical & electronic