A Class of Optimal Control Problem for Stochastic Discrete-Time Systems with Average Reward Reinforcement Learning.

Yifan Hu,Junjie Fu,Yuezu Lv
DOI: https://doi.org/10.1109/ICPS49255.2021.9468152
2021-01-01
Abstract:In this paper, a class of optimal control problem for stochastic discrete-time systems is addressed by average reward reinforcement learning. First, the optimal control problem of the stochastic discrete-time system is transformed into a sequential decision problem for Markov decision process (MDP). It is proven that the admissible policies are gain-optimal and the optimal policy is bias-optimal with the average reward criterion, respectively. Then, sufficient conditions to almost surely (a.s.) stabilize the system are proposed. Based on the above results, an on-policy average-reward-based reinforcement learning algorithm is developed. Finally, simulation results are provided to illustrate the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?