Recursive Approaches for Single Sample Path Based Markov Reward Processes

Hai‐Tao Fang,Han‐Fu Chen,Xi‐Ren Cao
DOI: https://doi.org/10.1111/j.1934-6093.2001.tb00038.x
IF: 2.4
2001-01-01
Asian Journal of Control
Abstract:ABSTRACTIn this paper, two single sample path‐based recursive approaches for Markov decision problems are proposed. One is based on the simultaneous perturbation approach and can be applied to the general state problem, but its convergence rate is low. In this algorithm, the small perturbation on current parameters is necessary to get another sample path for comparison, but it may worsen the system. Hence, we introduce another approach, which directly estimates the gradient of the performance for optimization by “potential” theory. This algorithm, however, is limited to finite state space systems, but its convergence speed is higher than the first one. The estimate for gradient can be obtained by using the sample path with current parameters without any perturbation. This approach is more acceptable for practical applications.
What problem does this paper attempt to address?