Energy-Based Imitation Learning

Minghuan Liu,Tairan He,Minkai Xu,Weinan Zhang
DOI: https://doi.org/10.5555/3463952.3464049
2020-01-01
Abstract:We tackle a common scenario in imitation learning (IL), where agents try torecover the optimal policy from expert demonstrations without further access tothe expert or environment reward signals. Except the simple Behavior Cloning(BC) that adopts supervised learning followed by the problem of compoundingerror, previous solutions like inverse reinforcement learning (IRL) and recentgenerative adversarial methods involve a bi-level or alternating optimizationfor updating the reward function and the policy, suffering from highcomputational cost and training instability. Inspired by recent progress inenergy-based model (EBM), in this paper, we propose a simplified IL frameworknamed Energy-Based Imitation Learning (EBIL). Instead of updating the rewardand policy iteratively, EBIL breaks out of the traditional IRL paradigm by asimple and flexible two-stage solution: first estimating the expert energy asthe surrogate reward function through score matching, then utilizing such areward for learning the policy by reinforcement learning algorithms. EBILcombines the idea of both EBM and occupancy measure matching, and via theoreticanalysis we reveal that EBIL and Max-Entropy IRL (MaxEnt IRL) approaches aretwo sides of the same coin, and thus EBIL could be an alternative ofadversarial IRL methods. Extensive experiments on qualitative and quantitativeevaluations indicate that EBIL is able to recover meaningful and interpretativereward signals while achieving effective and comparable performance againstexisting algorithms on IL benchmarks.
What problem does this paper attempt to address?