Algorithms for optimization and stabilization of controlled Markov chains

Sean Meyn
DOI: https://doi.org/10.1007/bf02823147
1999-08-01
Sadhana
Abstract:This article reviews some recent results by the author on the optimal control of Markov chains. Two common algorithms for the construction of optimal policies are considered: value iteration and policy iteration.In either case, it is found that the following hold when the algorithm is properly initialized:A stochastic Lyapunov function exists for each intermediate policy, and hence each policy isregular (a strong stability condition).Intermediate costs converge to the optimal cost.Any limiting policy is average cost optimal.The network scheduling problem is considered in some detail as both an illustration of the theory, and because of the strong conclusions which can be reached for this important example as an application of the general theory.
What problem does this paper attempt to address?