Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning

Namiko Saito,Joao Moura,Hiroki Uchida,Sethu Vijayakumar

2024-03-16

Abstract:Recognising the characteristics of objects while a robot handles them is crucial for adjusting motions that ensure stable and efficient interactions with containers. Ahead of realising stable and efficient robot motions for handling/transferring the containers, this work aims to recognise the latent unobservable object characteristics. While vision is commonly used for object recognition by robots, it is ineffective for detecting hidden objects. However, recognising objects indirectly using other sensors is a challenging task. To address this challenge, we propose a cross-modal transfer learning approach from vision to haptic-audio. We initially train the model with vision, directly observing the target object. Subsequently, we transfer the latent space learned from vision to a second module, trained only with haptic-audio and motor data. This transfer learning framework facilitates the representation of object characteristics using indirect sensor data, thereby improving recognition accuracy. For evaluating the recognition accuracy of our proposed learning framework we selected shape, position, and orientation as the object characteristics. Finally, we demonstrate online recognition of both trained and untrained objects using the humanoid robot Nextage Open.

Robotics,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issue of robots interacting with objects inside containers in a stable and efficient manner when handling or transferring containers, especially when these objects are obscured by container lids and cannot be directly observed. To adjust the robot's actions to ensure stable and efficient interaction with the objects inside the container (e.g., avoiding objects from toppling over, mixing, or being damaged by impact), the goal of the paper is to identify the potential invisible characteristics of these objects inside the containers. To tackle this challenge, the authors propose a cross-modal transfer learning approach from visual to tactile-auditory modalities. Specifically, the model is first trained using visual information that can directly observe the target object's characteristics; subsequently, the latent space learned from the visual modality is transferred to a second module, which is trained using only tactile-auditory and motion data. In this way, even if the objects are obscured by the container, their characteristics, such as shape, position, and orientation, can be indirectly identified. The research contributions include: 1. Proposing a two-stage learning framework to identify potential object characteristics. 2. Demonstrating that by "preheating" the latent state of the tactile-auditory modality in the second stage with the latent state learned from the visual module, the recognition of visual characteristics using only indirect tactile-auditory and motion sensing can be significantly improved. 3. Validating the proposed framework on a physical humanoid robot platform, achieving online prediction of the object's shape, position, and orientation. The method particularly focuses on dynamically identifying object characteristics while the robot performs tasks, which is crucial for generating targeted actions that depend on the object's characteristics.

Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning

Deep Active Cross-Modal Visuo-Tactile Transfer Learning for Robotic Object Recognition

Transferring Implicit Knowledge of Non-Visual Object Properties Across Heterogeneous Robot Morphologies

How to select and use tools? : Active Perception of Target Objects Using Multimodal Deep Learning

Enhanced robotic tactile perception with spatiotemporal sensing and logical reasoning for robust object recognition

Cross-Tool and Cross-Behavior Perceptual Knowledge Transfer for Grounded Object Recognition

Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery using Voice Recognition

Adaptive visual–tactile fusion recognition for robotic operation of multi-material system

Deep Neural Object Analysis by Interactive Auditory Exploration with a Humanoid Robot

A Framework for Sensorimotor Cross-Perception and Cross-Behavior Knowledge Transfer for Object Categorization

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

Simple Kinesthetic Haptics for Object Recognition

Machine learning and Sensor-Based Multi-Robot System with Voice Recognition for Assisting the Visually Impaired

Bayesian and Neural Inference on LSTM-based Object Recognition from Tactile and Kinesthetic Information

Multi - target objects and complex color recognition model based on humanoid robot

Robot to Human Object Handover using Vision and Joint Torque Sensor Modalities

Multimodal integration learning of robot behavior using deep neural networks

Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces

OBT-Trace: An approach to trace and recognition object motion through prosthetic hand gestures

Open-Ended Fine-Grained 3D Object Categorization by Combining Shape and Texture Features in Multiple Colorspaces

Bridging realities: training visuo-haptic object recognition models for robots using 3D virtual simulations