Balancing Value Iteration and Policy Iteration for Discrete-Time Control.

Biao Luo,Yin Yang,Huai-Ning Wu,Tingwen Huang
DOI: https://doi.org/10.1109/tsmc.2019.2898389
2020-01-01
Abstract:The optimal control problem of discrete-time nonlinear systems depends on the solution of the Bellman equation. In this paper, an adaptive reinforcement learning (RL) method is developed to solve the complex Bellman equation, which balances value iteration (VI) and policy iteration (PI). By adding a balance parameter, an adaptive RL integrates VI and PI together, which accelerates VI and avoids the need of an initial admissible control. The convergence of the adaptive RL is proved by showing that it converges to the Bellman equation. Subsequently, the adaptive RL is realized by using the neural network (NN) approximation for value function and a least-squares scheme is developed for updating NN weights. Then, the convergence of NN-based adaptive RL is proved with considering NN approximation error. To further improve its performance, an adaptive rule is developed for tuning balance parameter in adaptive RL iteration by iteration. Finally, the effectiveness of the adaptive RL is validated with simulation studies.
What problem does this paper attempt to address?