Finite-Time Analysis of Asynchronous Q-Learning Under Diminishing Step-Size From Control-Theoretic View

Han-Dong Lim,Donghwan Lee
DOI: https://doi.org/10.1109/access.2024.3476564
IF: 3.9
2024-10-19
IEEE Access
Abstract:Q-learning has long been one of the most popular reinforcement learning algorithms, and theoretical analysis of Q-learning has been an active research topic for decades. Although researches on asymptotic convergence analysis of Q-learning have a long tradition, non-asymptotic convergence has only recently come under active study. The main goal of this paper is to investigate a new finite-time analysis of asynchronous Q-learning under Markovian observation models via a control system viewpoint. In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes for our analysis, which significantly improves recent development of the switching system analysis with constant step-sizes, and leads to convergence rate that is comparable to or better than most of the state of the art results in the literature. In the meanwhile, we consider the continuous-time Lyapunov equation to avoid the difficulty in the analysis posed by using diminishing step-sizes in discrete-time. The proposed analysis brings in additional insights, covers different scenarios, and provides new simplified templates for analysis to deepen our understanding on Q-learning via its unique connection to discrete-time switching systems.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?