Off-policy Imitation Learning from Visual Inputs

Zhihao Cheng,Li Shen,Dacheng Tao
DOI: https://doi.org/10.1109/icra48891.2023.10161566
2023-01-01
Abstract:Recently, various successful applications utilizing expert states in imitation learning (IL) have been witnessed. However, IL from visual inputs (ILfVI), which has a greater promise to be widely applied by using online visual resources, suffers from low data-efficiency and poor performance resulted from on-policy learning and high-dimensional visual inputs. We propose OPIfVI (Off-Policy Imitation from Visual Inputs), which is composed of an off-policy learning manner, data augmentation, and encoder techniques, to tackle the mentioned challenges, respectively. More specifically, to improve data-efficiency, OPIfVI conducts IL in an off-policy manner, with which sampled data used multiple times. In addition, we enhance the stability of OPIfVI with spectral normalization to mitigate the side effect of off-policy training. The core factor, contributing to the poor performance of ILfVI, that we think is agents could not extract meaningful features from visual inputs. Hence, OPIfVI employs data augmentation from computer vision to help train encoders to better extract features from visual inputs. Besides, a specific structure of gradient backpropagation for the encoder is designed to stabilize the encoder training. At last, we demonstrate that OPIfVI can achieve expert-level performance and outperform existing baselines via extensive experiments using DeepMind Control Suite.
What problem does this paper attempt to address?