Abstract:Object pose estimation is a prominent task in computer vision. The object pose gives the orientation and translation of the object in real-world space, which allows various applications such as manipulation, augmented reality, etc. Various objects exhibit different properties with light, such as reflections, absorption, etc. This makes it challenging to understand the object's structure in RGB and depth channels. Recent research has been moving toward learning-based methods, which provide a more flexible and generalizable approach to object pose estimation utilizing deep learning. One such approach is the render-and-compare method, which renders the object from multiple views and compares it against the given 2D image, which often requires an object representation in the form of a CAD model. We reason that the synthetic texture of the CAD model may not be ideal for rendering and comparing operations. We showed that if the object is represented as an implicit (neural) representation in the form of Neural Radiance Field (NeRF), it exhibits a more realistic rendering of the actual scene and retains the crucial spatial features, which makes the comparison more versatile. We evaluated our NeRF implementation of the render-and-compare method on transparent datasets and found that it surpassed the current state-of-the-art results.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the 6D pose estimation problem of transparent objects. Specifically, the author aims to improve the pose estimation accuracy of transparent objects by using implicit representations (especially Neural Radiance Field, NeRF). Traditional methods face challenges when dealing with transparent, reflective or non - Lambertian surfaces, because the characteristics of these surfaces depend on specific viewing angles and backgrounds and are difficult to represent with conventional explicit textures. ### Main contributions of the paper 1. **Proposed a 6D pose estimation pipeline for transparent objects based on a single RGB image and sparse multi - view images**: This method does not require pre - training a pose estimator for specific object instances. 2. **Combined the classical rendering - comparison method with NeRF for view synthesis**: Utilize NeRF to generate high - quality and view - dependent transparent object hypotheses, thereby improving the accuracy of pose estimation. 3. **Tested the proposed method on four large - scale datasets**: These datasets contain transparent and reflective household items in complex environments and were evaluated using multiple evaluation metrics (such as MSPD, MSSD, ADD, ADD - S, translation error, rotation error and 3D IoU). ### Method overview 1. **Data collection**: Simulate realistic non - Lambertian properties by applying reflection or transmission shaders to CAD models and render high - quality images at different viewing angles to optimize NeRF. 2. **NeRF training**: Train NeRF using the volume rendering equation so that it can represent scenes with complex geometric structures and appearances. 3. **Coarse estimation and refinement block**: First, perform a coarse estimation on the sampled rendered views through a classification task, and then gradually refine the pose by iteratively adding small translation and rotation adjustments. 4. **Fine - tuning process**: Fine - tune MegaPose6D and NeRF view synthesis on a synthetic dataset containing transparent objects to improve performance on transparent objects. ### Evaluation results The experimental results show that this method outperforms existing methods on multiple benchmark datasets. Especially in the glass category, fine - tuning significantly improves the results, even under more stringent IoU thresholds. ### Conclusion The NeRF - based rendering - comparison method proposed in this paper demonstrates the potential for pose estimation of unseen transparent objects using only RGB images. As a representation form, NeRF shows more realistic rendering effects at different viewing angles, thereby improving the accuracy of pose estimation.

Object Pose Estimation Using Implicit Representation For Transparent Objects

Pose Estimation and Neural Implicit Reconstruction Towards Non-Cooperative Spacecraft Without Offline Prior Information

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

NeRF-Feat: 6D Object Pose Estimation using Feature Rendering

ReN Human: Learning Relightable Neural Implicit Surfaces for Animatable Human Rendering

Pose-Free Neural Radiance Fields via Implicit Pose Regularization

6DoF Pose Estimation of Transparent Object from a Single RGB-D Image

NeuralTO: Neural Reconstruction and View Synthesis of Translucent Objects

Neural Correspondence Field for Object Pose Estimation

Diff-DOPE: Differentiable Deep Object Pose Estimation

Pose Estimation of Specific Rigid Objects

Pose Estimation for Texture-less Shiny Objects in a Single RGB Image Using Synthetic Training Data

Object-Based Illumination Estimation with Rendering-Aware Neural Networks

Generalizable Pose Estimation Using Implicit Scene Representations

ER-Pose: Learning Edge Representation for 6D Pose Estimation of Texture-Less Objects.

FocalPose++: Focal Length and Object Pose Estimation via Render and Compare

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

Object Pose Estimation Using Edge Images Synthesized from Shape Information

Object Pose Estimation Using Color Images and Predicted Depth Maps

NARF24: Estimating Articulated Object Structure for Implicit Rendering

Neural Capture of Animatable 3D Human from Monocular Video.