Reinforcement learning algorithms for semi-Markov decision processes with average reward

Yanjie Li
DOI: https://doi.org/10.1109/ICNSC.2012.6204909
2012-01-01
Abstract:In this paper, we study reinforcement learning (RL) algorithms based on a perspective of performance sensitivity analysis for SMDPs with average reward. We present the results about performance sensitivity analysis for SMDPs with average reward. On these bases, two RL algorithms for average-reward SMDPs are studied. One algorithm is the relative value iteration (RVI) RL algorithm, which avoids the estimation of optimal average reward in the process of learning. Another algorithm is a policy gradient estimation algorithm, which extends the policy gradient estimation algorithm for discrete time Markov decision processes (MDPs) to SMDPs and only requires half storage of the existing algorithm.
What problem does this paper attempt to address?