Abstract:Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.

What problem does this paper attempt to address?

This paper aims to solve the stability problem of stochastic approximation algorithms in Markov noise environments. Specifically, the goal of the paper is to extend the well - known Borkar - Meyn theorem to make it applicable to Markov noise settings, thereby enhancing its applicability in reinforcement learning, especially when using linear function approximation and eligibility traces in off - policy reinforcement learning algorithms. ### Problems the paper attempts to solve 1. **Stability problem**: - One of the core challenges of stochastic approximation algorithms is to establish their stability, that is, to prove that the stochastic vector iteration is almost surely bounded. The paper attempts to solve this problem by extending the Borkar - Meyn theorem so that it can handle Markov noise. 2. **Handling Markov noise**: - The traditional Borkar - Meyn theorem assumes that the noise is independently and identically distributed (i.i.d.), which limits its application in many reinforcement learning problems because the noise in these scenarios is usually a Markov chain. The paper introduces new assumptions and techniques to make the theorem able to handle Markov noise. 3. **Wide applicability of the algorithm**: - The paper also demonstrates the wide application of its results in reinforcement learning, especially when using linear function approximation and eligibility traces in off - policy reinforcement learning algorithms. These algorithms are very common in practical applications, so improving their theoretical basis is of great significance. ### Main contributions 1. **Extension of the Borkar - Meyn theorem**: - The paper proposes more general assumptions, enabling the Borkar - Meyn theorem to be applied to Markov noise settings. This includes the analysis of the asymptotic change rate of the function \( H \) and the use of the Lyapunov drift condition of the Markov chain. 2. **New technical methods**: - The paper introduces the concept of scaled iteration and uses the Arzela - Ascoli theorem and the Moore - Osgood theorem to prove key convergence results. These technical methods provide a new perspective for handling Markov noise. 3. **Verification of practical applications**: - The paper verifies the validity of its theoretical results through specific examples of reinforcement learning algorithms. These examples include the use of linear function approximation and eligibility traces in off - policy reinforcement learning algorithms. ### Key assumptions and results - **Assumption 1**: The Markov chain \(\{Y_n\}\) has a unique invariant probability measure. - **Assumption 2**: The learning rates \(\{\alpha(i)\}\) are positive, decreasing, and satisfy \(\sum_{i = 0}^{\infty}\alpha(i)=\infty\). - **Assumption 3**: The way in which the function \( H_c \) converges to \( H_\infty \). - **Assumption 4**: The Lipschitz continuity of the functions \( H_c \) and \( H_\infty \). - **Assumption 5**: The function \( h_c(x) \) converges uniformly on any compact set to \( h_\infty(x) \), and 0 is a globally asymptotically stable equilibrium point of the ODE \(\frac{dx(t)}{dt}=h_\infty(x(t))\). - **Assumption 6**: The learning rates \(\{\alpha(n)\}\) satisfy certain conditions, as well as the strong law conditions regarding the Markov chain. ### Main theorems - **Theorem 1**: Under the conditions that Assumptions 1 - 5 hold, if Assumption 6 or 6' holds, then the iterations \(\{x_n\}\) generated by (1) are stable, that is, \(\sup_n\|x_n\|<\infty\) almost surely holds. - **Corollary 1**: Under the conditions that Assumptions 1 - 5 hold, if Assumption 6 or 6' holds, then the iterations \(\{x_n\}\) generated by (1) almost surely converge to a bounded invariant set of the ODE \(\frac{dx(t)}{dt}=h(x(t))\). Through these contributions, the paper provides a solid theoretical basis for handling stochastic approximation algorithms in Markov noise and opens up new avenues for their application in reinforcement learning.

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem

Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise

Stochastic Optimization with Non-stationary Noise: the Power of Moment Estimation

Markovian Foundations for Quasi-Stochastic Approximation with Applications to Extremum Seeking Control

Stochastic Optimization with Non-stationary Noise

Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Markovian Foundations for Quasi-Stochastic Approximation in Two Timescales: Extended Version

Central Limit Theorem for Two-Timescale Stochastic Approximation with Markovian Noise: Theory and Applications

Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

A General Framework for Analyzing Stochastic Dynamics in Learning Algorithms

Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning

Stochastic LQ optimal control for Markov jumping systems with multiplicative noise using reinforcement learning

Multiplicative noise and heavy tails in stochastic optimization

Accelerated Multi-Time-Scale Stochastic Approximation: Optimal Complexity and Applications in Reinforcement Learning and Multi-Agent Games

Perturbed iterate analysis for asynchronous stochastic optimization

Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities