Abstract:Self-supervised Object Segmentation (SOS) aims to segment objects without any annotations. Under conditions of multi-camera inputs, the structural, textural and geometrical consistency among each view can be leveraged to achieve fine-grained object segmentation. To make better use of the above information, we propose Surface representation based Self-supervised Object Segmentation (Surface-SOS), a new framework to segment objects for each view by 3D surface representation from multi-view images of a scene. To model high-quality geometry surfaces for complex scenes, we design a novel scene representation scheme, which decomposes the scene into two complementary neural representation modules respectively with a Signed Distance Function (SDF). Moreover, Surface-SOS is able to refine single-view segmentation with multi-view unlabeled images, by introducing coarse segmentation masks as additional input. To the best of our knowledge, Surface-SOS is the first self-supervised approach that leverages neural surface representation to break the dependence on large amounts of annotated data and strong constraints. These constraints typically involve observing target objects against a static background or relying on temporal supervision in videos. Extensive experiments on standard benchmarks including LLFF, CO3D, BlendedMVS, TUM and several real-world scenes show that Surface-SOS always yields finer object masks than its NeRF-based counterparts and surpasses supervised single-view baselines remarkably. Code is available at: https://github.com/zhengxyun/Surface-SOS.

SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular Images

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks

Semantic Reconstruction based on RGB Image and Sparse Depth

SSNet: a joint learning network for semantic segmentation and disparity estimation

S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

Hybridnet for depth estimation and semantic segmentation

CI-Net: a joint depth estimation and semantic segmentation network using contextual information

Simultaneous Semantic Segmentation and Depth Completion with Constraint of Boundary

Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation

JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds

DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion

Rethinking Training Objective for Self-Supervised Monocular Depth Estimation - Semantic Cues to Rescue.

An Object-Aware Network Embedding Deep Superpixel for Semantic Segmentation of Remote Sensing Images

RelationNet: Learning Deep-Aligned Representation for Semantic Image Segmentation

SemSegDepth: A Combined Model for Semantic Segmentation and Depth Completion

JSH-Net: joint semantic segmentation and height estimation using deep convolutional networks from single high-resolution remote sensing imagery

CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network

MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

HDNet: Hybrid Distance Network for semantic segmentation

An Efficient Ensemble Deep Learning Approach for Semantic Point Cloud Segmentation Based on 3D Geometric Features and Range Images