VTFEFN: An End-to-End Visual-Tactile Feature Extraction and Fusion Network

Yuanpei Zhang,Jie Hao,Z. Zou,L. Shu,Shuai Tian
DOI: https://doi.org/10.1109/ISCIPT61983.2024.10672780
2024-05-24
Abstract:While visual-tactile fusion methods are increasingly utilized in robotic perception and manipulation, extracting and fusing features efficiently from visual and tactile data remains a significant challenge. To tackle this problem, we propose a Visual-Tactle Feature Extraction and Fusion Network(VTFEFN). We employ a dual-branch structure to separately extract visual and tactile information features, which are subsequently fused at feature level. We leverage convolutional neural network for local feature extraction and utilize attention mechanism to capture long-term dependency. Specifically, we finetune a pre-trained convolutional neural network to extract visual features, while employing multi-head self-attention and one-dimensional convolutional neural network for tactile feature extraction. To assess the efficacy of this model, we construct a visual-tactile joint dataset and conduct object classification experiments. Abundant experiments demonstrate that our approach achieves superior results compared to single-modal methods.
Computer Science,Engineering
What problem does this paper attempt to address?