Abstract:Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel Bifurcated Backbone Strategy Network (BBS-Net). Our architecture, is simple, efficient, and backbone-independent. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Extensive experiments show that BBS-Net significantly outperforms 18 state-of-the-art (SOTA) models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach (~4% improvement in S-measure <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.218ex" height="1.676ex" style="vertical-align: -0.338ex;" viewBox="0 -576.1 955 721.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-76" x="0" y="0"></use> <use xlink:href="#MJMATHI-73" x="485" y="0"></use></g></svg></span> . the top-ranked model: DMRA). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. The complete algorithm, benchmark results, and post-processing toolbox are publicly available at https://github.com/zyjwuyan/BBS-Net.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="1" id="MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path></defs></svg>

Light-TBFNet: RGB-D Salient Detection Based on a Lightweight Two-Branch Fusion Strategy

Dual-Branch Feature Fusion Network for Salient Object Detection

Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection

Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient Object Detection

A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection

JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network

Bidirectional Feature Learning Network for RGB-D Salient Object Detection

Middle-level Fusion for Lightweight RGB-D Salient Object Detection

Feature Calibrating and Fusing Network for RGB-D Salient Object Detection

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

Bifurcated Backbone Strategy for RGB-D Salient Object Detection

CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection

MFUR-Net

Discriminative feature fusion for RGB-D salient object detection

Hybrid Attention Mechanism and Forward Feedback Unit for RGB-D Salient Object Detection

An adaptive guidance fusion network for RGB-D salient object detection

Feature interaction and two-stage cross-modal fusion for RGB-D salient object detection

RGB-D Salient Object Detection Method Based on Multi-Modal Fusion and Contour Guidance

TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection

Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion