Abstract:Object detection and 6D pose estimation in the crowd (scenes with multiple object instances, severe foreground occlusions and background distractors), has become an important problem in many rapidly evolving technological areas such as robotics and augmented reality. Single shot-based 6D pose estimators with manually designed features are still unable to tackle the above challenges, motivating the research towards unsupervised feature learning and next-best-view estimation. In this work, we present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we a) propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder and b) offer an extensive evaluation of various state of the art features. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets.

Active 6D Multi-Object Pose Estimation in Cluttered Scenarios with Deep Reinforcement Learning

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Estimating 6D Object Poses with Temporal Motion Reasoning for Robot Grasping in Cluttered Scenes

Proactive Multi-Camera Collaboration For 3D Human Pose Estimation

Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects

Multi-Agent Deep Reinforcement Learning for Online 3D Human Poses Estimation

SLAM-Supported Self-Training for 6D Object Pose Estimation

Diet and nutrition in polycystic ovary syndrome (PCOS): Pointers for nutritional management

MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion

Simultaneous Semantic and Collision Learning for 6-DoF Grasp Pose Estimation

TrackAgent: 6D Object Tracking via Reinforcement Learning

6D Pose Estimation with Combined Deep Learning and 3D Vision Techniques for a Fast and Accurate Object Grasping

A Multi-Hypothesis Approach to Pose Ambiguity in Object-Based SLAM

Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd

Deep instance segmentation and 6D object pose estimation in cluttered scenes for robotic autonomous grasping

Spatial Feature Mapping for 6DoF Object Pose Estimation

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation

Deep Learning-Based 6-DoF Object Pose Estimation Considering Synthetic Dataset

Hierarchical Policies for Cluttered-Scene Grasping with Latent Plans

Self-supervised 6D Object Pose Estimation for Robot Manipulation