VITO-Transformer: A Visual-Tactile Fusion Network for Object Recognition

Baojiang Li,Jibo Bai,Shengjie Qiu,Haiyan Wang,Yuting Guo
DOI: https://doi.org/10.1109/tim.2023.3326241
IF: 5.6
2023-11-07
IEEE Transactions on Instrumentation and Measurement
Abstract:Outstanding advances have been made in visual learning methods for object recognition. However, machine vision recognition methods would lose their effectiveness when objects are visually indistinguishable. Since object tactile learning can access information that is not available for visual learning, it provides an important alternative form for object recognition. As a result, methods that integrate visual and tactile learning to recognize objects have been explored. There is a clear gap between visual and tactile information, and this limitation becomes more and more prominent with the development of visual-tactile learning. Most existing visual-tactile fusion learning methods lack effective fusion mechanisms to handle different tactile information types and lack sufficient accuracy to meet practical industrial needs. In this article, we propose a visual-tactile fusion network (VITO-Transformer) for object recognition to cope with these problems. Specifically, we design a special mechanism that can fuse visual and tactile information based on the transformer network to solve the problem that it is difficult to fuse visual and tactile information due to their large differences. Thanks to this special fusion mechanism, the accuracy of object recognition is substantially improved. Finally, a large number of comparative experiments are conducted on publicly available and self-made visual-tactile datasets to verify the advantages of the proposed VITO-Transformer and validate the effectiveness of the proposed fusion mechanism by comparing it with the current popular network algorithms. In this article, the proposed VITO-Transformer network can process different tactile information through a special tactile fusion mechanism, which brings a new solution to the field of visual-tactile fusion development.
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?