Abstract:Depth cues are known to be useful for visual perception. However, direct measurement of depth is often impracticable. Fortunately, though, modern learning-based methods offer promising depth maps by inference in the wild. In this work, we adapt such depth inference models for object segmentation using the objects' "pop-out" prior in 3D. The "pop-out" is a simple composition prior that assumes objects reside on the background surface. Such compositional prior allows us to reason about objects in the 3D space. More specifically, we adapt the inferred depth maps such that objects can be localized using only 3D information. Such separation, however, requires knowledge about contact surface which we learn using the weak supervision of the segmentation mask. Our intermediate representation of contact surface, and thereby reasoning about objects purely in 3D, allows us to better transfer the depth knowledge into semantics. The proposed adaptation method uses only the depth model without needing the source data used for training, making the learning process efficient and practical. Our experiments on eight datasets of two challenging tasks, namely camouflaged object detection and salient object detection, consistently demonstrate the benefit of our method in terms of both performance and generalizability.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the issue of cross-domain and cross-task knowledge transfer, particularly transferring deep knowledge to target tasks without source data. Specifically, the authors propose a novel method that leverages the "pop-out" prior of objects to improve object detection tasks, including Salient Object Detection (SOD) and Camouflaged Object Detection (COD). ### Background and Challenges 1. **Importance of Depth Information**: - Depth cues are very useful for visual perception, but directly measuring depth is often not feasible. - Modern learning-based methods can infer and generate high-quality depth maps in the wild. 2. **Challenges of Cross-Domain and Cross-Task Transfer**: - Existing Source-Free Domain Adaptation (SDA) methods usually assume that the source task and target task are similar, or that the label space is discrete and known. - These assumptions may not hold in practical applications, especially in the transfer of deep knowledge. ### Proposed Method 1. **"Pop-out" Prior**: - Assuming objects are located on the background surface, this prior allows us to infer the position of objects in 3D space. - By learning the contact surface, objects can be separated from the background, thus better converting depth knowledge into semantic information. 2. **Framework Overview**: - **Source-Free Depth Network**: Generates source-free depth maps. - **Object Pop-out Network**: Converts source-free depth maps into object pop-out depth maps. - **Segmentation Network**: Uses pop-out depth maps to estimate object masks and contact surfaces. - **Object Separation Module**: Utilizes the contact surface to separate objects from the background, generating pseudo-semantic masks. 3. **Loss Functions**: - **Depth Pop-out Loss**: Ensures structural similarity between pop-out depth maps and source-free depth maps. - **Local Smoothness Loss**: Ensures depth smoothness within object regions. - **Edge Sharpening Loss**: Enhances depth changes at object boundaries. - **Segmentation Loss**: Supervises the gap between semantic predictions and ground truth. ### Experimental Results 1. **Datasets**: - Experiments were conducted on eight datasets, including four SOD datasets (NLPR, NJUK, STERE, SIP) and four COD datasets (CAMO, CHAMELEON, COD10K, NC4K). 2. **Performance Evaluation**: - Four commonly used evaluation metrics were used: Mean Absolute Error (M), Maximum F-measure (Fm), S-measure (Sm), and Maximum E-measure (Em). - Results show that the proposed method significantly outperforms baseline methods in both SOD and COD tasks, achieving state-of-the-art performance on multiple datasets. ### Main Contributions 1. **Practical and Novel Problem**: Transferring deep knowledge across domains and tasks without source data. 2. **Simple and Effective Prior**: Utilizing the "pop-out" prior of objects for visual understanding. 3. **Significant Performance Improvement**: Achieving significant performance gains over baseline methods and existing models in two different tasks. Through these contributions, the paper provides a new and effective solution for cross-domain and cross-task transfer of deep knowledge.

Source-free Depth for Object Pop-out

Depth incorporating with color improves salient object detection

Depth-aware salient object detection using anisotropic center-surround difference

Depth Is All You Need for Monocular 3D Detection

Probabilistic and Geometric Depth: Detecting Objects in Perspective

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild.

Object segmentation from sparse views of wide-baseline images.

Task-Aware Monocular Depth Estimation for 3D Object Detection

Understanding Depth Map Progressively: Adaptive Distance Interval Separation for Monocular 3d Object Detection

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

Depth-guided Texture Diffusion for Image Semantic Segmentation

Depth-aware Panoptic Segmentation

On the Viability of Monocular Depth Pre-training for Semantic Segmentation

Unsupervised Monocular Depth Perception: Focusing on Moving Objects

Boosting Monocular 3D Object Detection with Object-Centric Auxiliary Depth Supervision

Joint Object Segmentation and Depth Upsampling

MonoCD: Monocular 3D Object Detection with Complementary Depths

Semisupervised learning-based depth estimation with semantic inference guidance

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image

Exploring Depth Contribution for Camouflaged Object Detection

Unsupervised Joint 3D Object Model Learning and 6D Pose Estimation for Depth-Based Instance Segmentation.