Abstract:The proliferation of 2D foundation models has sparked research into adapting them for open-world 3D instance segmentation. Recent methods introduce a paradigm that leverages superpoints as geometric primitives and incorporates 2D multi-view masks from Segment Anything model (SAM) as merging guidance, achieving outstanding zero-shot instance segmentation results. However, the limited use of 3D priors restricts the segmentation performance. Previous methods calculate the 3D superpoints solely based on estimated normal from spatial coordinates, resulting in under-segmentation for instances with similar geometry. Besides, the heavy reliance on SAM and hand-crafted algorithms in 2D space suffers from over-segmentation due to SAM's inherent part-level segmentation tendency. To address these issues, we propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors. Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors, which reduces the initial errors that accumulate in subsequent procedures. On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process. Furthermore, we notice a considerable portion of low-quality ground truth annotations in ScanNetV2 benchmark, which affect the fair evaluations. Thus, we present ScanNetV2-INS with complete ground truth labels and supplement additional instances for 3D class-agnostic instance segmentation. Experimental evaluations on various 2D-3D datasets demonstrate the effectiveness and robustness of our approach. Our code and proposed ScanNetV2-INS dataset are available HERE.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the deficiencies of existing 3D instance segmentation methods in handling instances in open - world scenes, especially the under - segmentation problem that occurs when instances have similar geometric features, and the over - segmentation problem caused by the part - level segmentation tendency of 2D base models such as the Segment Anything Model (SAM). To address these problems, the authors propose SA3DIP (Segment Any 3D Instance by Exploiting Latent 3D Priors), aiming to generate more fine - grained 3D primitives by combining geometric and texture priors and improve the merging process by introducing additional constraints in 3D space, thereby enhancing the quality and robustness of 3D instance segmentation. Specifically, the SA3DIP method includes the following key points: 1. **Generate Complementary 3D Primitives**: Different from previous methods that generate primitives solely based on geometric information, SA3DIP combines geometric and texture information, which helps to reduce errors in the initial stage and avoid instances with similar geometric features being wrongly grouped into the same primitive. 2. **Scene Graph Construction**: Utilize the 2D multi - view masks generated by 2D base segmenters (such as SAM) to construct a super - point graph, where 3D primitives are nodes and the affinity scores between them are the weights of the edges. This process helps to ensure multi - view consistency. 3. **Region Growing and Instance - Aware Refinement**: Perform affinity - and distance - aware region growing on the constructed scene graph to merge 3D primitives. In addition, by integrating the 3D spatial constraints provided by 3D detectors, further merge over - segmented 3D instances while maintaining the ability to handle fine - grained objects. 4. **ScanNetV2 - INS Dataset**: Noticing that there are a large number of low - quality ground - truth annotations in the widely - used ScanNet dataset, the authors propose an improved version, ScanNetV2 - INS, which corrects incomplete annotations and adds more instances to better reflect real - world scenes and provide a more fair evaluation of model performance. Through these innovations, SA3DIP not only improves the accuracy of 3D instance segmentation but also enhances its ability to handle complex and unseen 3D scenes. Experimental results show that this method outperforms existing methods on multiple 2D - 3D datasets.

SA3DIP: Segment Any 3D Instance with Potential 3D Priors

SAI3D: Segment Any Instance in 3D Scenes

Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking

SAM-guided Graph Cut for 3D Instance Segmentation

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

SAM3D: Segment Anything in 3D Scenes

3D Object Segmentation Using Cross-Window Point Transformer with Latent Semantic Boundary Guidance

Superpoint Transformer for 3D Scene Instance Segmentation.

Superpoint-guided Semi-supervised Semantic Segmentation of 3D Point Clouds

SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation

Instance Segmentation in 3D Scenes Using Semantic Superpoint Tree Networks

SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

When 3D Bounding-Box Meets SAM: Point Cloud Instance Segmentation with Weak-and-Noisy Supervision

Learning Inter-Superpoint Affinity for Weakly Supervised 3D Instance Segmentation

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant

Instance-aware 3D Semantic Segmentation powered by Shape Generators and Classifiers

Segment Anything in 3D with NeRFs.

PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models

Segment Anything in 3D with Radiance Fields