RVI reinforcement learning for semi-Markov decision processes with average reward

Yanjie Li,Fang Cao
DOI: https://doi.org/10.1109/WCICA.2010.5554785
2010-01-01
Intelligent Control and Automation
Abstract:Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.
What problem does this paper attempt to address?