Value Iteration for Streaming Data on a Continuous Space with Gradient Method in an RKHS.

Jiamin Liu,Wangli Xu,Yue Wang,Heng Lian
DOI: https://doi.org/10.1016/j.neunet.2023.07.036
IF: 7.8
2023-01-01
Neural Networks
Abstract:The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of complexity theory of appropriately chosen function spaces and the iterative update of the value function when stochastic gradient descent (SGD) is used. For the classical prediction problem in reinforcement learning based on i.i.d. streaming data in the framework of reproducing kernel Hilbert spaces, we establish polynomial sample complexity taking into account the smoothness of the value function. In particular, we prove that the gradient descent algorithm efficiently computes the value function with appropriately chosen step sizes, with a convergence rate that can be close to 1/N, which is the best possible rate for parametric SGD. The advantages of using the gradient descent algorithm include its computational convenience and it can naturally deal with streaming data.
What problem does this paper attempt to address?