Abstract:A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks. This adaptability allows them to respond to potentially inevitable shifts in the data-generating distribution over time. However, in Continual Learning (CL) settings, models often struggle to balance learning new tasks (plasticity) with retaining previous knowledge (memory stability). Consequently, they are susceptible to Catastrophic Forgetting, which degrades performance and undermines the reliability of deployed systems. Variational Continual Learning methods tackle this challenge by employing a learning objective that recursively updates the posterior distribution and enforces it to stay close to the latest posterior estimate. Nonetheless, we argue that these methods may be ineffective due to compounding approximation errors over successive recursions. To mitigate this, we propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations, preventing individual errors from dominating future posterior updates and compounding over time. We reveal insightful connections between these objectives and Temporal-Difference methods, a popular learning mechanism in Reinforcement Learning and Neuroscience. We evaluate the proposed objectives on challenging versions of popular CL benchmarks, demonstrating that they outperform standard Variational CL methods and non-variational baselines, effectively alleviating Catastrophic Forgetting.

A Variance Minimization Approach to Temporal-Difference Learning

Reanalysis of Variance Reduced Temporal Difference Learning

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP

Gradient Descent Temporal Difference-Difference Learning

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

Online Sparse Temporal Difference Learning Based on Nested Optimization and Regularized Dual Averaging.

A Convergent Off-Policy Temporal Difference Algorithm

Off-Policy Temporal Difference Learning with Bellman Residuals

Temporal-Difference Variational Continual Learning

An Optimistic Value Iteration for Mean–variance Optimization in Discounted Markov Decision Processes

Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability

Is Temporal Difference Learning Optimal? an Instance-Dependent Analysis

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Toward Efficient Gradient-Based Value Estimation

An Iterative Approach to Reduce the Variance of Stochastic Dynamic Systems

Temporal Difference Learning as Gradient Splitting

Almost Sure Convergence of Average Reward Temporal Difference Learning

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning