HDNet: Multi-Modality Hierarchy-Aware Decision Network for RGB-D Salient Object Detection

Chengxing Xia,Songsong Duan,Bin Ge,Hanling Zhang,Kuan-Ching Li
DOI: https://doi.org/10.1109/lsp.2022.3229640
2022-01-01
IEEE Signal Processing Letters
Abstract:RGB-D Salient object detection (SOD) is a pixel-level dense prediction task, which can highlight the prominent object in the scene. Recently, Convolution Neural Network (CNN) is widely applied in SOD to generate multi-level features, which are complementary to each other. However, most methods ignore the unique characteristics of multi-level features (high-level and low-level features). Given the effective employment of multi-level features, we propose a novel multi-modality hierarchy-aware decision network (HDNet) by embedding a Swin Transformer as an encoder. The proposed HDNet contains three primary designs: (1) a Swin Transformer encoder is employed instead of a CNN to learn long-range dependencies; (2) a hierarchy-aware feature decision mechanism (HFDM) is proposed to exploit effective local detail cues of low-level features and global semantic information of high-level features, which consists of two sub-modules, namely low-hierarchy edge module (LEM) and high-hierarchy region module (HRM); (3) a decision-based fusion module (DFM) is designed to fuse RGB and depth features under the attribute of multi-level features generated from HFDM. Experiments on five public benchmarks verify that our framework has better performance than the other 18 state-of-the-art algorithms.
What problem does this paper attempt to address?