Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

Yao Mu,Baiyu Peng,Ziqing Gu,Shengbo Eben Li,Chang Liu,Bingbing Nie,Jianfeng Zheng,Bo Zhang
DOI: https://doi.org/10.23919/iccas50221.2020.9268413
2020-01-01
Abstract:Reinforcement learning has the potential to control stochastic nonlinear systems in optimal manners successfully. We propose a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The dual representation includes an empirical dynamic model and a set of state-action data. The former can embed the designer's knowledge and reduce the difficulty of learning, and the latter can be used to compensate the model inaccuracy since it reflects the real system dynamics accurately. Such a design has the capability of improving both learning accuracy and training speed. In the mixed RL framework, the additive uncertainty of stochastic model is compensated by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The effectiveness of mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).
What problem does this paper attempt to address?