Convergence of stochastic iterative dynamic programming algorithms

Tommi Jaakkola, Michael Jordan, Satinder Singh
1993-01-01
Abstract:Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learn (cid: 173) ing problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DP-based learning algorithms to the pow (cid: 173) erful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD (") and Q-Iearning belong.
What problem does this paper attempt to address?