Sample-Efficient Policy Learning Based on Completely Behavior Cloning.

Qiming Zou,Ling Wang,Yu Li,Jie Liu
DOI: https://doi.org/10.1109/smc.2019.8914085
2018-01-01
Abstract:Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In order to overcome these challenges, this paper proposed a training-free behavior cloning algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC transforms the Model Predictive Control (MPC) controller into a PieceWise Affine (PWA) function with multi-parametric programming, and uses a neural network to express this function. By this way, off-the-shelf deep reinforcement learning algorithms can be used to fine-tune this neural network. The experiments show that our method can help agent learn at the high reward state region, and converge faster and better.
What problem does this paper attempt to address?