Abstract:Statistical inference with finite-sample validity for the value function of a given policy in Markov decision processes (MDPs) is crucial for ensuring the reliability of reinforcement learning. Temporal Difference (TD) learning, arguably the most widely used algorithm for policy evaluation, serves as a natural framework for this <a class="link-external link-http" href="http://purpose.In" rel="external noopener nofollow">this http URL</a> this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results. First, we derive a novel sharp high-dimensional probability convergence guarantee that depends explicitly on the asymptotic variance and holds under weak conditions. We further establish refined high-dimensional Berry-Esseen bounds over the class of convex sets that guarantee faster rates than those in the literature. Finally, we propose a plug-in estimator for the asymptotic covariance matrix, designed for efficient online computation. These results enable the construction of confidence regions and simultaneous confidence intervals for the linear parameters of the value function, with guaranteed finite-sample coverage. We demonstrate the applicability of our theoretical findings through numerical experiments.

Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Finite-Time Bounds for AMSGrad-Enhanced Neural TD

Decentralized Adaptive TD $(\lambda)$ Learning with Linear Function Approximation: Nonasymptotic Analysis

Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Provable distributed adaptive temporal-difference learning over time-varying networks

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

Theoretical analysis of deep neural networks for temporally dependent observations

Target-Based Temporal Difference Learning

Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

Geometric Insights into the Convergence of Nonlinear TD Learning

Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation

Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

Reanalysis of Variance Reduced Temporal Difference Learning

Adaptive deep learning for nonlinear time series models

Why Target Networks Stabilise Temporal Difference Methods