GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting

Jilan Mei,Junbo Li,Cai Meng
2024-11-07
Abstract:This paper proposes a new method for accurate and robust 6D pose estimation of novel objects, named GS2Pose. By introducing 3D Gaussian splatting, GS2Pose can utilize the reconstruction results without requiring a high-quality CAD model, which means it only requires segmented RGBD images as input. Specifically, GS2Pose employs a two-stage structure consisting of coarse estimation followed by refined estimation. In the coarse stage, a lightweight U-Net network with a polarization attention mechanism, called Pose-Net, is designed. By using the 3DGS model for supervised training, Pose-Net can generate NOCS images to compute a coarse pose. In the refinement stage, GS2Pose formulates a pose regression algorithm following the idea of reprojection or Bundle Adjustment (BA), referred to as GS-Refiner. By leveraging Lie algebra to extend 3DGS, GS-Refiner obtains a pose-differentiable rendering pipeline that refines the coarse pose by comparing the input images with the rendered images. GS-Refiner also selectively updates parameters in the 3DGS model to achieve environmental adaptation, thereby enhancing the algorithm's robustness and flexibility to illuminative variation, occlusion, and other challenging disruptive factors. GS2Pose was evaluated through experiments conducted on the LineMod dataset, where it was compared with similar algorithms, yielding highly competitive results. The code for GS2Pose will soon be released on GitHub.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the problem of achieving accurate and robust 6D pose estimation of novel objects without high-quality CAD models. Specifically, traditional methods lack robustness when dealing with lighting variations, occlusions, and other disturbances, and usually require high-quality CAD models as input. However, in practical applications, obtaining these high-quality CAD models is often very difficult or impossible. Therefore, this paper proposes a new method—GS2Pose, which utilizes 3D Gaussian Splatting technology and segmented RGBD images to achieve 6D pose estimation of novel objects. ### Main Contributions 1. **No Need for CAD Models**: By introducing 3D Gaussian Splatting technology, lightweight 6D pose estimation of novel objects is achieved without CAD models. 2. **Improved Differentiable Rendering Pipeline**: By modifying the differentiable rendering pipeline of 3D Gaussian Splatting using Lie algebra, an iterative optimization algorithm based on reprojection (GS-Refiner) is developed, which can simultaneously correct object pose and camera pose. 3. **Environmental Adaptability**: By selectively regressing the parameters of the 3D Gaussian Splatting model, a robust 6D pose estimation algorithm is developed that can handle complex lighting, motion blur, and occlusions. 4. **Experimental Validation**: Experiments on the LineMod dataset demonstrate that the GS2Pose model has significant advantages in terms of accuracy, inference speed, and computational resource efficiency. ### Method Overview 1. **Coarse Pose Estimation**: A lightweight U-Net network (Pose-Net) is designed to generate NOCS images through supervised training, thereby calculating a coarse pose. 2. **Pose Refinement**: An iterative optimization algorithm based on 3D Gaussian Splatting reprojection (GS-Refiner) is proposed, which gradually optimizes the pose by continuously minimizing the difference between the rendered image and the input image, ultimately yielding precise output. ### Experimental Results Experimental results on the LineMod dataset show that GS2Pose outperforms existing methods in multiple categories, especially demonstrating stronger robustness and accuracy in handling complex scenes such as lighting variations and occlusions. ### Conclusion The GS2Pose method proposed in this paper achieves efficient, accurate, and robust 6D pose estimation of novel objects without high-quality CAD models, providing a new solution for related applications in the field of computer vision.