Gradient Q(σ, Λ): A Unified Algorithm with Function Approximation for Reinforcement Learning

Long Yang,Yu Zhang,Qian Zheng,Pengfei Li,Gang Pan
2019-01-01
Abstract:Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa)algorithms are efficient and frequently used techniques in reinforcementlearning. Q(σ,λ) is the first approach unifies them witheligibility trace through the sampling degree σ. However, it is limitedto the tabular case, for large-scale learning, the Q(σ,λ) is tooexpensive to require a huge volume of tables to accurately storage valuefunctions. To address above problem, we propose a GQ(σ,λ) thatextends tabular Q(σ,λ) with linear function approximation. Weprove the convergence of GQ(σ,λ). Empirical results on somestandard domains show that GQ(σ,λ) with a combination offull-sampling with pure-expectation reach a better performance thanfull-sampling and pure-expectation methods.
What problem does this paper attempt to address?