Provably Efficient Reinforcement Learning with General Value Function Approximation.

Ruosong Wang,Ruslan Salakhutdinov,Lin F. Yang
2020-01-01
Abstract:Value function approximation has demonstrated phenomenal empirical success inreinforcement learning (RL). Nevertheless, despite a handful of recent progresson developing theory for RL with linear function approximation, theunderstanding of general function approximation schemes largely remainsmissing. In this paper, we establish a provably efficient RL algorithm withgeneral value function approximation. We show that if the value functions admitan approximation with a function class ℱ, our algorithm achieves aregret bound of O(poly(dH)√(T)) where d is acomplexity measure of ℱ that depends on the eluder dimension [Russoand Van Roy, 2013] and log-covering numbers, H is the planning horizon, andT is the number interactions with the environment. Our theory generalizesrecent progress on RL with linear value function approximation and does notmake explicit assumptions on the model of the environment. Moreover, ouralgorithm is model-free and provides a framework to justify the effectivenessof algorithms used in practice.
What problem does this paper attempt to address?