Segment Anything in 3D with Radiance Fields

Jiazhong Cen,Jiemin Fang,Zanwei Zhou,Chen Yang,Lingxi Xie,Xiaopeng Zhang,Wei Shen,Qi Tian
2024-04-16
Abstract:The Segment Anything Model (SAM) emerges as a powerful vision foundation model to generate high-quality 2D segmentation results. This paper aims to generalize SAM to segment 3D objects. Rather than replicating the data acquisition and annotation procedure which is costly in 3D, we design an efficient solution, leveraging the radiance field as a cheap and off-the-shelf prior that connects multi-view 2D images to the 3D space. We refer to the proposed solution as SA3D, short for Segment Anything in 3D. With SA3D, the user is only required to provide a 2D segmentation prompt (e.g., rough points) for the target object in a single view, which is used to generate its corresponding 2D mask with SAM. Next, SA3D alternately performs mask inverse rendering and cross-view self-prompting across various views to iteratively refine the 3D mask of the target object. For one view, mask inverse rendering projects the 2D mask obtained by SAM into the 3D space with guidance of the density distribution learned by the radiance field for 3D mask refinement; Then, cross-view self-prompting extracts reliable prompts automatically as the input to SAM from the rendered 2D mask of the inaccurate 3D mask for a new view. We show in experiments that SA3D adapts to various scenes and achieves 3D segmentation within seconds. Our research reveals a potential methodology to lift the ability of a 2D segmentation model to 3D. Our code is available at <a class="link-external link-https" href="https://github.com/Jumpat/SegmentAnythingin3D" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to extend the existing 2D segmentation model (Segment Anything Model, SAM) to 3D scenes. Specifically, the paper proposes a method called SA3D, which stands for "Segment Anything in 3D." #### Main Contributions: 1. **Avoiding Expensive Data Collection and Annotation**: Compared to recollecting and annotating a large amount of 3D data, SA3D leverages the existing 2D segmentation model SAM and connects multi-view 2D images with 3D space through Radiance Fields. 2. **Efficient Solution**: By using Radiance Fields as a cheap and readily available prior, SA3D can generate high-quality 3D segmentation results from single-view input prompts. 3. **Iterative Refinement Process**: SA3D iteratively refines the 3D mask of the target object by alternately performing Mask Inverse Rendering and Cross-view Self-Prompting. - **Mask Inverse Rendering**: Projects the 2D mask into 3D space for refinement. - **Cross-view Self-Prompting**: Automatically extracts reliable prompts for new views. #### Experimental Validation: - The paper demonstrates the effectiveness of SA3D on multiple datasets and proves its adaptability to different scenarios, including frontal views and 360-degree scenes. - Through comprehensive experiments, the compatibility of SA3D with different Radiance Field models is validated, and its working mechanism is deeply analyzed. ### Summary This paper proposes an efficient 3D segmentation method, SA3D, which combines the existing 2D segmentation model SAM with Radiance Field technology to achieve high-quality 3D segmentation, showcasing its broad application potential.