Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou,Jiayuan Gu,Tung Yen Chiang,Fanbo Xiang,Hao Su
2024-06-26
Abstract:The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing on point clouds. Our approach utilizes a transformer-based method, extending SAM to the 3D domain. We leverage part-level and object-level annotations and introduce a data engine to generate pseudo labels from SAM, thereby distilling 2D knowledge into our 3D model. Our model outperforms state-of-the-art models on several indoor and outdoor benchmarks and demonstrates a variety of applications, such as 3D annotation. Codes and demo can be found at <a class="link-external link-https" href="https://github.com/zyc00/Point-SAM" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the problem of 3D point cloud segmentation and proposes a new model called Point-SAM. Specifically: - **Objective**: To establish a 3D promptable segmentation model for point clouds as a foundational step in building a 3D foundational model. This model can uniformly handle point cloud data from different data sources and predict effective segmentation masks. - **Challenges**: In the 3D domain, compared to 2D image segmentation, there are issues such as non-uniform data formats, a lack of lightweight models, and scarce annotated data. Additionally, existing attempts are limited to extending 2D image results to 3D space, which is affected by factors like image quality and viewpoint selection, and ensuring multi-view consistency is challenging. - **Method**: The authors propose Point-SAM, a model based on the Transformer architecture that can process input point cloud data and generate segmentation results through point prompts and mask prompts. To expand the training dataset, they developed a data engine to generate pseudo-labels, using SAM to generate initial diverse mask proposals and iteratively refining these proposals. - **Contributions**: These include the development of Point-SAM, a 3D foundational model for point clouds; the proposal of a data engine to generate pseudo-labels with a large number of diverse masks; and the successful extension of the model and dataset for 3D segmentation experiments, demonstrating the model's zero-shot transfer capability on unseen point cloud distributions.