Abstract:Estimating rigid objects' poses is one of the fundamental problems in computer vision, with a range of applications across automation and augmented reality. Most existing approaches adopt one network per object class strategy, depend heavily on objects' 3D models, depth data, and employ a time-consuming iterative refinement, which could be impractical for some applications. This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation that addresses these limitations. The CVAM-Pose method employs a label-embedded conditional variational autoencoder network, to implicitly abstract regularised representations of multiple objects in a single low-dimensional latent space. This autoencoding process uses only images captured by a projective camera and is robust to objects' occlusion and scene clutter. The classes of objects are one-hot encoded and embedded throughout the network. The proposed label-embedded pose regression strategy interprets the learnt latent space representations utilising continuous pose representations. Ablation tests and systematic evaluations demonstrate the scalability and efficiency of the CVAM-Pose method for multi-object scenarios. The proposed CVAM-Pose outperforms competing latent space approaches. For example, it is respectively 25% and 20% better than AAE and Multi-Path methods, when evaluated using the $\mathrm{AR_{VSD}}$ metric on the Linemod-Occluded dataset. It also achieves results somewhat comparable to methods reliant on 3D models reported in BOP challenges. Code available: <a class="link-external link-https" href="https://github.com/JZhao12/CVAM-Pose" rel="external noopener nofollow">this https URL</a>

KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation

KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

2-Entity Random Sample Consensus for Robust Visual Localization: Framework, Methods, and Verifications

Unseen Object Pose Estimation via Registration

Pose Estimation for Cross-Domain Non-Cooperative Spacecraft Based on Spatial-Aware Keypoints Regression

PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

Robust Stereo Visual Odometry Using Improved RANSAC-Based Methods for Mobile Robot Localization

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling

KDFNet: Learning Keypoint Distance Field for 6D Object Pose Estimation

RNNPose: 6-DoF Object Pose Estimation Via Recurrent Correspondence Field Estimation and Pose Optimization

RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration

Generalized Differentiable RANSAC

MarkerPose: Robust Real-time Planar Target Tracking for Accurate Stereo Pose Estimation

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation

Multi-View Keypoints for Reliable 6D Object Pose Estimation

PVNet: Pixel-Wise Voting Network for 6dof Object Pose Estimation.

REDE: End-to-End Object 6D Pose Robust Estimation Using Differentiable Outliers Elimination