Abstract:Robots must be able to understand their surroundings to perform complex tasks in challenging environments and many of these complex tasks require estimates of physical properties such as friction or weight. Estimating such properties using learning is challenging due to the large amounts of labelled data required for training and the difficulty of updating these learned models online at run time. To overcome these challenges, this paper introduces a novel, multi-modal approach for representing semantic predictions and physical property estimates jointly in a probabilistic manner. By using conjugate pairs, the proposed method enables closed-form Bayesian updates given visual and tactile measurements without requiring additional training data. The efficacy of the proposed algorithm is demonstrated through several hardware experiments. In particular, this paper illustrates that by conditioning semantic classifications on physical properties, the proposed method quantitatively outperforms state-of-the-art semantic classification methods that rely on vision alone. To further illustrate its utility, the proposed method is used in several applications including to represent affordance-based properties probabilistically and a challenging terrain traversal task using a legged robot. In the latter task, the proposed method represents the coefficient of friction of the terrain probabilistically, which enables the use of an on-line risk-aware planner that switches the legged robot from a dynamic gait to a static, stable gait when the expected value of the coefficient of friction falls below a given threshold. Videos of these case studies as well as the open-source C++ and ROS interface can be found at <a class="link-external link-https" href="https://roahmlab.github.io/multimodal_mapping/" rel="external noopener nofollow">this https URL</a>.

Multimodal representation models for prediction and control from partial information

Multimodal integration learning of robot behavior using deep neural networks

Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors

Probabilistic Multimodal Modeling for Human-Robot Interaction Tasks

Bridging vision and touch: advancing robotic interaction prediction with self-supervised multimodal learning

Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning

iCub! Do you recognize what I am doing?: multimodal human action recognition on multisensory-enabled iCub robot

Tell and show: Combining multiple modalities to communicate manipulation tasks to a robot

Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective

Multimodal Representation Learning for Place Recognition Using Deep Hebbian Predictive Coding

A Multimodal Information Fusion Model for Robot Action Recognition with Time Series

Learning Unseen Modality Interaction

Generative Modeling of Multimodal Multi-Human Behavior

You've Got to Feel It To Believe It: Multi-Modal Bayesian Inference for Semantic and Property Prediction

Learning Sequential Latent Variable Models from Multimodal Time Series Data

Learning Self-Awareness for Autonomous Vehicles: Exploring Multisensory Incremental Models

Learning Deep Features for Robotic Inference from Physical Interactions.

What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?

Multi-modal perception for soft robotic interactions using generative models

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

Multimodal Trajectory Prediction for Diverse Vehicle Types in Autonomous Driving with Heterogeneous Data and Physical Constraints