Average Reward Reinforcement Learning For Semi-Markov Decision Processes

Jiayuan Yang,Yanjie Li,Haoyao Chen,Jiangang Li
DOI: https://doi.org/10.1007/978-3-319-70087-8_79
2017-01-01
Abstract:In this paper, we study new reinforcement learning (RL) algorithms for Semi-Markov decision processes (SMDPs) with an average reward criterion. Based on the discrete-time type Bellman optimality equation, we use incremental value iteration (IVI), stochastic shortest path (SSP) value iteration and bisection algorithms to derive novel RL algorithms in a straightforward way. These algorithms use IVI, SSP and dichotomy to directly estimate the optimal average reward to solve the instability of average reward RL, respectively. Furthermore, a simulation experiment is used to compare the convergence among these algorithms.
What problem does this paper attempt to address?