A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation

Zaiwei Chen,Siva T. Maguluri,Sanjay Shakkottai,Karthikeyan Shanmugam
DOI: https://doi.org/10.1287/opre.2022.0249
IF: 2.7
2023-10-08
Operations Research
Abstract:The stochastic approximation (SA) method stands as the foundational mathematical tool for modern large-scale optimization and machine learning. Therefore, gaining a theoretical understanding of SA algorithms is of fundamental interest. In their paper titled “A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation,” Chen et al. present a unified Lyapunov framework for the finite-sample analysis of a Markovian SA algorithm under a contractive operator with respect to an arbitrary norm. The key novelty lies in the construction of a smooth Lyapunov function called the generalized Moreau envelope. The authors demonstrate the effectiveness of their SA results in the context of reinforcement learning (RL), specifically through popular algorithms such as variants of temporal difference (TD) learning and Q-learning. As byproducts, the results provide theoretical insights into the efficiency of bootstrapping in TD learning with eligibility traces and the bias-variance tradeoff in off-policy learning.
management,operations research & management science
What problem does this paper attempt to address?