Digital Twin-Enabled Grasp Outcomes Assessment for Unknown Objects Using Visual-Tactile Fusion Perception

Zhuangzhuang Zhang,Zhinan Zhang,Lihui Wang,Xiaoxiao Zhu,Huang,Qixin Cao
DOI: https://doi.org/10.1016/j.rcim.2023.102601
IF: 10.103
2023-01-01
Robotics and Computer-Integrated Manufacturing
Abstract:Humans can instinctively predict whether a given grasp will be successful through visual and rich haptic feedback. Towards the next generation of smart robotic manufacturing, robots must be equipped with similar capabilities to cope with grasping unknown objects in unstructured environments. However, most existing data-driven methods take global visual images and tactile readings from the real-world system as input, making them incapable of predicting the grasp outcomes for cluttered objects or generating large-scale datasets. First, this paper proposes a visual-tactile fusion method to predict the results of grasping cluttered objects, which is the most common scenario for grasping applications. Concretely, the multimodal fusion network (MMFN) uses the local point cloud within the gripper as the visual signal input, while the tactile signal input is the images provided by two high-resolution tactile sensors. Second, collecting data in the real world is high-cost and time-consuming. Therefore, this paper proposes a digital twin-enabled robotic grasping system to collect large-scale multimodal datasets and investigates how to apply domain randomization and domain adaptation to bridge the sim-to-real transfer gap. Finally, extensive validation experiments are conducted in physical and virtual environments. The experimental results demonstrate the effectiveness of the proposed method in assessing grasp stability for cluttered objects and performing zero-shot sim-to-real policy transfer on the real robot with the aid of the proposed migration strategy.
What problem does this paper attempt to address?