Learning Deep Features for Robotic Inference from Physical Interactions.

Atabak Dehban,Shanghang Zhang,Nino Cauli,Lorenzo Jamone,Jose Santos
DOI: https://doi.org/10.1109/tcds.2022.3152383
IF: 4.546
2023-01-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:In order to effectively handle multiple tasks that are not predefined, a robotic agent needs to automatically map its high-dimensional sensory inputs into useful features. As a solution, feature learning has empirically shown substantial improvements in obtaining representations that are generalizable to different tasks, compared to feature engineering approaches, but it requires a large amount of data and computational capacity. These challenges are specifically relevant in robotics due to the low signal-to-noise ratios inherent to robotic data, and to the cost typically associated with collecting this type of input. In this article, we propose a deep probabilistic method based on convolutional variational autoencoders (CVAEs) to learn visual features suitable for interaction and recognition tasks. We run our experiments on a self-supervised robotic sensorimotor data set. Our data were acquired with the iCub humanoid and are based on a standard object collection, thus being readily extensible. We evaluated the learned features in terms of usability for: 1) object recognition; 2) capturing the statistics of the effects; and 3) planning. In addition, where applicable, we compared the performance of the proposed architecture with other state-of-the-art models. These experiments demonstrate that our model is capable of capturing the functional statistics of action and perception (i.e., images) which performs better than existing baselines, without requiring millions of samples or any hand-engineered features.
What problem does this paper attempt to address?