Abstract:To more actively perform fine manipulation tasks in the real world, intelligent robots should be able to understand and communicate the physical attributes of the material during interaction with an object. Tactile and vision are two important sensing modalities in robotic perception system. In this article, we propose a cross-modal material perception framework for recognizing novel objects. Concretely, it first adopts an object-agnostic method to associate information from tactile and visual modalities. It then recognizes a novel object by using its tactile signal to retrieve perceptually similar surface material images through the learned cross-modal correlation. This problem exhibits a challenge because data from visual and tactile modalities are highly heterogeneous and weakly paired. Moreover, the framework should not only consider cross-modal pairwise relevance but also be discriminative and generalized for unseen objects. To this end, we propose a weakly paired cross-modal adversarial learning (WCMAL) model for the visual–tactile cross-modal retrieval, which combines the advantages of deep learning and adversarial learning. In particular, the model fully considers the weak pairing problem between the two modalities. Finally, we conduct verification experiments on a publicly available data set. The results demonstrate the effectiveness of the proposed method. Note to Practitioners—Since cross-modal perception can improve the active operation of automation systems, it is invaluable for industrial intelligence, particularly when only one sensing modality cannot be used or suitable in some applications. In this article, we provide a framework of cross-modal material perception for object recognition using the idea of the cross-modal retrieval. Concretely, we use relevant tactile data of an unknown object to retrieve perceptually similar surface images, which are used to evaluate its material properties. Dif-erent from that previous works using tactile information as a complement or alternative to visual information to recognize specific objects, our proposed framework is able to estimate and infer material properties of both seen and unseen objects, which can enhance manipulation systems intelligence and improve the quality of the interaction. In our future works, more modality information will be incorporated to further enhance the cross-modal material perception.

Deep Active Cross-Modal Visuo-Tactile Transfer Learning for Robotic Object Recognition

Vision-Based Robotic Object Grasping—A Deep Reinforcement Learning Approach

Cross-Modal Material Perception for Novel Objects: A Deep Adversarial Learning Method

End-to-End ConvNet for Tactile Recognition Using Residual Orthogonal Tiling and Pyramid Convolution Ensemble

Transfer of Learning from Vision to Touch: A Hybrid Deep Convolutional Neural Network for Visuo-Tactile 3D Object Recognition

Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning

Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces

Adaptive visual–tactile fusion recognition for robotic operation of multi-material system

OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping

Enhanced robotic tactile perception with spatiotemporal sensing and logical reasoning for robust object recognition

Drop to Transfer: Learning Transferable Features for Robot Tactile Material Recognition in Open Scene

VITO-Transformer: A Visual-Tactile Fusion Network for Object Recognition

Gradient adaptive sampling and multiple temporal scale 3D CNNs for tactile object recognition

Visuo-Tactile based Predictive Cross Modal Perception for Object Exploration in Robotics

Lifelong Visual-Tactile Cross-Modal Learning for Robotic Material Perception

Deep learning-based method for vision-guided robotic grasping of unknown objects

End-to-End Active Object Tracking and Its Real-World Deployment Via Reinforcement Learning

A Framework for Sensorimotor Cross-Perception and Cross-Behavior Knowledge Transfer for Object Categorization

Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training

“Touching to See” and “Seeing to Feel”: Robotic Cross-modal Sensory Data Generation for Visual-Tactile Perception

Bridging realities: training visuo-haptic object recognition models for robots using 3D virtual simulations