GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

Van Nguyen Nguyen,Thibault Groueix,Mathieu Salzmann,Vincent Lepetit
2024-03-15
Abstract:We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative "templates", rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest-neighbor search in feature space, results in a speedup factor of 35x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with existing refinement methods. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at <a class="link-external link-https" href="https://github.com/nv-nguyen/gigaPose" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper proposes a method called **GigaPose**, which aims to address two main issues in novel object pose estimation (6D pose estimation): 1. **Low inference speed**: Existing coarse pose estimation methods rely on template matching, resulting in slow processing speeds. For example, MegaPose requires over 1.6 seconds to process each detected object. 2. **Sensitivity to segmentation errors**: Existing template matching methods perform poorly when dealing with segmentation errors caused by occlusion. Specifically, the GigaPose method addresses these issues in the following ways: - Using local feature matching templates, achieving a 35-fold speed improvement in template search. - In the coarse pose estimation stage, estimating the remaining four degrees of freedom (i.e., in-plane rotation and translation) through a single 2D-2D correspondence, improving robustness to segmentation errors. Experimental results show that GigaPose achieves significant performance improvements on seven core datasets of the BOP challenge and can be seamlessly integrated into existing refinement methods to achieve higher accuracy and faster speeds. Additionally, GigaPose can utilize 3D models predicted from a single image for pose estimation, reducing the need for precise CAD models and making 6D pose estimation more convenient.