Learning Semantic Keypoint Representations for Door Opening Manipulation

Jiayu Wang,Shize Lin,Chuxiong Hu,Yu Zhu,Limin Zhu
DOI: https://doi.org/10.1109/lra.2020.3026963
2020-01-01
Abstract:We consider a scenario where a robot is capable of autonomously opening previously unseen doors. Prior works either use model-based methods that rely strongly on accurate kinematic models, or learn a policy from scratch through trial-and-error, which cannot generalize to large variations in shape, and location of doors. In this letter, we propose a novel method for opening unseen doors with no prior knowledge of door model, which leverages semantic 3D keypoints as door handle representations to generate the end-effector trajectory from a motion planner. The keypoint representations are predicted from raw visual input by a deep neural network, which can provide a concise, and semantic description of the handle to determine the grasp pose, and subsequent motion planning. In contrast to existing works that require known object models or significant manual effort on data collection, we present a data augmentation technique to automatically generate large amounts of realistic-looking synthetic data with almost no human labeling effort. An augmented dataset, consisting of large amounts of synthetic data, and small amounts of real data, is used to train the network. Qualitative results show that our proposed method outperforms the state-of-the-art pose-based methods on real test sets in terms of perception metrics. Hardware experiments demonstrate that our proposed method can achieve 94.2% success rate on opening 6 previously unseen doors with significant shape variations under different environments, and conditions.
What problem does this paper attempt to address?