Three-Dimensional Object Segmentation Method based on YOLO, SAM, and NeRF

Liang Yuan,Xinkai Wang,Ajian Liu,Shangzhe Wu,Lihui Sun
DOI: https://doi.org/10.1145/3627341.3630370
2023-08-25
Abstract:The neural radiance field (NeRF) representation has shown promising results in capturing 3D scenes. However, when it comes to editing or moving specific objects within virtual environments, it is essential to segment individual objects from the neural radiance field. Existing methods primarily rely on feature computation and clustering for 3D segmentation, but they often suffer from poor segmentation quality and low accuracy. Taking advantage of recent advancements in 2D image segmentation, we propose a 3D segmentation method that generates single-object segmentation from multiple viewpoint images. First, we compute camera poses using feature points from the original images. Then, with user guidance, we use the You Only Look Once (YOLO) network model to roughly locate the objects and utilize the resulting information as hints. Subsequently, we employ the Segment Anything Model (SAM) for fine-grained segmentation. The processed dataset is then rendered into a 3D segmented object using the neural radiance field, producing high-quality rendered images. By training on specific objects, we can segment object categories that the publicly available YOLO model cannot recognize, thereby enhancing generalization. Through experiments, we analyze the real-time performance of each module and qualitatively and quantitatively evaluate the effectiveness of our method. Compared to the N3F and ISRF methods, our approach achieves an average 17% improvement in Intersection over Union (IoU) and an average 9% improvement in Pixel Accuracy (Acc). We validate the performance of our method on the LLFF, mip-Nerf360 datasets, demonstrating its generalizability and effectiveness.
Computer Science
What problem does this paper attempt to address?