3D Convolution on RGB-D Point Clouds for Accurate Model-free Object Pose Estimation

Zhongang Cai,Cunjun Yu,Quang-Cuong Pham
DOI: https://doi.org/10.48550/arXiv.1812.11284
2018-12-29
Abstract:The conventional pose estimation of a 3D object usually requires the knowledge of the 3D model of the object. Even with the recent development in convolutional neural networks (CNNs), a 3D model is often necessary in the final estimation. In this paper, we propose a two-stage pipeline that takes in raw colored point cloud data and estimates an object's translation and rotation by running 3D convolutions on voxels. The pipeline is simple yet highly accurate: translation error is reduced to the voxel resolution (around 1 cm) and rotation error is around 5 degrees. The pipeline is also put to actual robotic grasping tests where it achieves above 90% success rate for test objects. Another innovation is that a motion capture system is used to automatically label the point cloud samples which makes it possible to rapidly collect a large amount of highly accurate real data for training the neural networks.
Robotics,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?