One-Shot Imitation Learning: A Pose Estimation Perspective

Pietro Vitiello,Kamil Dreczkowski,Edward Johns
2023-10-19
Abstract:In this paper, we study imitation learning under the challenging setting of: (1) only a single demonstration, (2) no further data collection, and (3) no prior task or object knowledge. We show how, with these constraints, imitation learning can be formulated as a combination of trajectory transfer and unseen object pose estimation. To explore this idea, we provide an in-depth study on how state-of-the-art unseen object pose estimators perform for one-shot imitation learning on ten real-world tasks, and we take a deep dive into the effects that camera calibration, pose estimation error, and spatial generalisation have on task success rates. For videos, please visit <a class="link-external link-https" href="https://www.robot-learning.uk/pose-estimation-perspective" rel="external noopener nofollow">this https URL</a>.
Robotics,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper mainly studies the application of one - shot imitation learning in robot manipulation, especially under the following three challenging conditions: 1. **Only one demonstration is provided**: The robot can only learn the task from a single demonstration. 2. **No further data collection**: After the initial demonstration, no additional data collection or environmental interaction is carried out. 3. **No prior knowledge of tasks or objects**: The robot has no prior knowledge of the task and the objects being manipulated. Specifically, the paper models one - shot imitation learning as a combination of trajectory transfer and unseen object pose estimation. The author explores the following problems through a series of experiments: #### Sections 4.1 and 4.2: The influence of pose estimation error on task success rate - Research how camera calibration error and pose estimation error affect the success rate of the task. - Prove through experiments that pose estimation error has a greater impact on the task success rate than camera calibration error, and rotation error is more critical than position error. #### Section 4.3: Benchmarking real - world tasks - Evaluate the performance of different one - shot unseen object pose estimation methods on ten real - world daily robot tasks. - Compare the performance of these methods with the existing state - of - the - art one - shot imitation learning methods (such as DOME). #### Section 4.4: Spatial generalization ability - Explore the robustness and generalization ability of trajectory transfer when the pose of the object changes relative to the demonstration. - Analyze the task success rate at different positions and poses, and reveal the impact of object pose changes on the task success rate. ### Key conclusions 1. **The importance of pose estimation**: A good pose estimation method is crucial for one - shot imitation learning, especially the influence of rotation error is more significant. 2. **Comparison of multiple methods**: Through simulation and real - world experiments, eight different pose estimation methods are compared, and it is found that the regression - based method performs the best. 3. **Spatial generalization ability**: As the object pose changes, the task success rate will decrease, but some methods (such as regression) show strong robustness. ### Formula summary - **Trajectory transfer formula**: \[ T_{Test}^{RE_t}=T_{Test}^{RO}T_{Demo}^{OE_t} \] where \(T_{Test}^{RO}\) is the transformation matrix of the object relative to the robot at the time of testing, and \(T_{Demo}^{OE_t}\) is the transformation matrix of the end - effector relative to the object at the time of demonstration. - **Relative pose transformation**: \[ R_{\delta}^R = T_{Test}^{RO}T_{Demo}^{OR} \] It represents the transformation of the object from the demonstration to the test scene, expressed in the robot coordinate system \(R\). - **Relative pose transformation in the camera coordinate system**: \[ C_{\delta}^C=T_{Test}^{CO}(T_{Demo}^{CO})^{-1} \] It represents the transformation of the object from the demonstration to the test scene, expressed in the camera coordinate system \(C\). Through these studies, the paper provides new insights into one - shot imitation learning and shows its potential in real - world tasks.