Abstract:This paper proposes a new method for accurate and robust 6D pose estimation of novel objects, named GS2Pose. By introducing 3D Gaussian splatting, GS2Pose can utilize the reconstruction results without requiring a high-quality CAD model, which means it only requires segmented RGBD images as input. Specifically, GS2Pose employs a two-stage structure consisting of coarse estimation followed by refined estimation. In the coarse stage, a lightweight U-Net network with a polarization attention mechanism, called Pose-Net, is designed. By using the 3DGS model for supervised training, Pose-Net can generate NOCS images to compute a coarse pose. In the refinement stage, GS2Pose formulates a pose regression algorithm following the idea of reprojection or Bundle Adjustment (BA), referred to as GS-Refiner. By leveraging Lie algebra to extend 3DGS, GS-Refiner obtains a pose-differentiable rendering pipeline that refines the coarse pose by comparing the input images with the rendered images. GS-Refiner also selectively updates parameters in the 3DGS model to achieve environmental adaptation, thereby enhancing the algorithm's robustness and flexibility to illuminative variation, occlusion, and other challenging disruptive factors. GS2Pose was evaluated through experiments conducted on the LineMod dataset, where it was compared with similar algorithms, yielding highly competitive results. The code for GS2Pose will soon be released on GitHub.

What problem does this paper attempt to address?

This paper attempts to address the problem of achieving accurate and robust 6D pose estimation of novel objects without high-quality CAD models. Specifically, traditional methods lack robustness when dealing with lighting variations, occlusions, and other disturbances, and usually require high-quality CAD models as input. However, in practical applications, obtaining these high-quality CAD models is often very difficult or impossible. Therefore, this paper proposes a new method—GS2Pose, which utilizes 3D Gaussian Splatting technology and segmented RGBD images to achieve 6D pose estimation of novel objects. ### Main Contributions 1. **No Need for CAD Models**: By introducing 3D Gaussian Splatting technology, lightweight 6D pose estimation of novel objects is achieved without CAD models. 2. **Improved Differentiable Rendering Pipeline**: By modifying the differentiable rendering pipeline of 3D Gaussian Splatting using Lie algebra, an iterative optimization algorithm based on reprojection (GS-Refiner) is developed, which can simultaneously correct object pose and camera pose. 3. **Environmental Adaptability**: By selectively regressing the parameters of the 3D Gaussian Splatting model, a robust 6D pose estimation algorithm is developed that can handle complex lighting, motion blur, and occlusions. 4. **Experimental Validation**: Experiments on the LineMod dataset demonstrate that the GS2Pose model has significant advantages in terms of accuracy, inference speed, and computational resource efficiency. ### Method Overview 1. **Coarse Pose Estimation**: A lightweight U-Net network (Pose-Net) is designed to generate NOCS images through supervised training, thereby calculating a coarse pose. 2. **Pose Refinement**: An iterative optimization algorithm based on 3D Gaussian Splatting reprojection (GS-Refiner) is proposed, which gradually optimizes the pose by continuously minimizing the difference between the rendered image and the input image, ultimately yielding precise output. ### Experimental Results Experimental results on the LineMod dataset show that GS2Pose outperforms existing methods in multiple categories, especially demonstrating stronger robustness and accuracy in handling complex scenes such as lighting variations and occlusions. ### Conclusion The GS2Pose method proposed in this paper achieves efficient, accurate, and robust 6D pose estimation of novel objects without high-quality CAD models, providing a new solution for related applications in the field of computer vision.

GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting

GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

KGNet: Knowledge-Guided Networks for Category-Level 6D Object Pose and Size Estimation.

Object Gaussian for Monocular 6D Pose Estimation from Sparse Views

P$^2$GNet: Pose-Guided Point Cloud Generating Networks for 6-DoF Object Pose Estimation

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting

GeoPose: Dense Reconstruction Guided 6D Object Pose Estimation with Geometric Consistency

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

DualPoseNet: Category-level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence

GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos

Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization

Prior Geometry Guided Direct Regression Network for Monocular 6D Object Pose Estimation