Self-Supervised Multi-Modal Learning for Collaborative Robotic Grasp-Throw

Yanxu Hou,Zihan Fang,Jun Li
DOI: https://doi.org/10.1109/lra.2024.3376151
IF: 5.2
2024-01-01
IEEE Robotics and Automation Letters
Abstract:Accurate throwing skills can expand the pick-and-place ability of a manipulator, which is significant but challenging in the field of robotics. Most existing robotic throwing methods neglect the mass of an object and air drag, not to mention the effect of a grasp on the subsequent throw, resulting in inaccurate throws. In this regard, we propose collaborative grasping and throwing learning (CGTL). It consists of a grasp agent with a grasping network (G-Net), a throw agent with a learning-based throw reference (LTR), and a multi-modal throw compensator network (MTC-Net). First, G-Net generates multi-channel grasp affordances for inferring grasps. Subsequently, LTR predicts a throw velocity reference by exploiting an air resistance estimation network (ARE-Net) and a projectile equation considering air drag. Meanwhile, MTC-Net uses multi-modal data to predict the compensation for the throwing velocity reference. Moreover, CGTL takes throwing performances into the reward of the grasp agent and the grasp affordances into the throw agent's observation to facilitate more accurate throwing. Finally, extensive experiments show that our CGTL outperforms its peers regarding throwing accuracy, especially when throwing different objects into new positions.
What problem does this paper attempt to address?