Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

Ying Yuan,Haichuan Che,Yuzhe Qin,Binghao Huang,Zhao-Heng Yin,Kang-Won Lee,Yi Wu,Soo-Chul Lim,Xiaolong Wang
2024-07-31
Abstract:Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at <a class="link-external link-https" href="https://yingyuan0414.github.io/visuotactile/" rel="external noopener nofollow">this https URL</a> .
Robotics,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of how to effectively integrate visual and tactile information when robots perform complex operational tasks. Specifically, the research proposes a new system called Robot Synesthesia, which achieves fine manipulation tasks such as hand object rotation by combining visual and tactile inputs. This system utilizes point cloud representation to integrate tactile data and unifies it with visual data in a three-dimensional space, thereby better fusing the information from both modalities and improving decision-making quality. Additionally, the research explores how to transfer the strategies trained in a simulated environment to real-world robotic hands to solve complex tasks such as dual-ball synchronous rotation and generalize to unseen new objects. Experimental results show that this method outperforms single-modality methods in various benchmark tasks and also demonstrates good performance in actual deployment.