Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning

Namiko Saito,Joao Moura,Hiroki Uchida,Sethu Vijayakumar
2024-03-16
Abstract:Recognising the characteristics of objects while a robot handles them is crucial for adjusting motions that ensure stable and efficient interactions with containers. Ahead of realising stable and efficient robot motions for handling/transferring the containers, this work aims to recognise the latent unobservable object characteristics. While vision is commonly used for object recognition by robots, it is ineffective for detecting hidden objects. However, recognising objects indirectly using other sensors is a challenging task. To address this challenge, we propose a cross-modal transfer learning approach from vision to haptic-audio. We initially train the model with vision, directly observing the target object. Subsequently, we transfer the latent space learned from vision to a second module, trained only with haptic-audio and motor data. This transfer learning framework facilitates the representation of object characteristics using indirect sensor data, thereby improving recognition accuracy. For evaluating the recognition accuracy of our proposed learning framework we selected shape, position, and orientation as the object characteristics. Finally, we demonstrate online recognition of both trained and untrained objects using the humanoid robot Nextage Open.
Robotics,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of robots interacting with objects inside containers in a stable and efficient manner when handling or transferring containers, especially when these objects are obscured by container lids and cannot be directly observed. To adjust the robot's actions to ensure stable and efficient interaction with the objects inside the container (e.g., avoiding objects from toppling over, mixing, or being damaged by impact), the goal of the paper is to identify the potential invisible characteristics of these objects inside the containers. To tackle this challenge, the authors propose a cross-modal transfer learning approach from visual to tactile-auditory modalities. Specifically, the model is first trained using visual information that can directly observe the target object's characteristics; subsequently, the latent space learned from the visual modality is transferred to a second module, which is trained using only tactile-auditory and motion data. In this way, even if the objects are obscured by the container, their characteristics, such as shape, position, and orientation, can be indirectly identified. The research contributions include: 1. Proposing a two-stage learning framework to identify potential object characteristics. 2. Demonstrating that by "preheating" the latent state of the tactile-auditory modality in the second stage with the latent state learned from the visual module, the recognition of visual characteristics using only indirect tactile-auditory and motion sensing can be significantly improved. 3. Validating the proposed framework on a physical humanoid robot platform, achieving online prediction of the object's shape, position, and orientation. The method particularly focuses on dynamically identifying object characteristics while the robot performs tasks, which is crucial for generating targeted actions that depend on the object's characteristics.