A unified approach for semi-Markov decision processes with discounted and average reward criteria

Yanjie Li,Huijing Wang,Haoyao Chen
DOI: https://doi.org/10.1109/WCICA.2014.7052983
2014-01-01
Intelligent Control and Automation
Abstract:On the basis of the sensitivity-based optimization, we develop a unified optimization approach for semi-Markov decision processes (SMDPs) with infinite horizon discounted and average reward criteria. We show that the sensitivity formula under average reward criteria is a limitation case of discounted reward criteria. On the basis of the performance sensitivity formulas, we provide a unified formulation for the policy iteration algorithms of semi-Markov decision processes with discounted and average reward criteria.
What problem does this paper attempt to address?