Slip Detection through Fusion of Visual-Tactile Data using Swin Transformer V2

Mingyu Shangguan,Yang Li
DOI: https://doi.org/10.1109/ICICM59499.2023.10365811
2023-10-20
Abstract:This paper presents an approach to enhance the stability of manipulator grasping tasks using a Swin Transformer V2 network model. The focus is on fusing GelSight Sensor single-mode vision, tactile data, and multi-modal information to improve the manipulator’s perception in complex environments. The Swin Transformer V2 model is introduced for its strong performance in image understanding. The paper explains how unimodal visual-tactile data are input to the network for feature extraction, followed by a fusion strategy to effectively combine different modalities. The proposed method is applied to a manipulator grasp-slide detection task, resulting in improved stability and accuracy by leveraging multi-modal perception of the environment. Experimental validation and comparisons demonstrate the superiority of the approach, showcasing its potential in enhancing grasping stability.
Computer Science,Engineering
What problem does this paper attempt to address?