Abstract:Reinforcement Learning (RL) technologies are powerful to learn how to interact with environments and have been successfully applied to various important applications. Q-learning is one of the most popular methods in RL, which leverages the Bellman equation to update the Q-function. Considering that data collection in RL is both time and cost consuming and Q-learning converges slowly, different kinds of transfer RL algorithms are designed to improve the sample complexity of the new tasks11In order to avoid confusion, we use “old/new tasks” instead of “source/target tasks” in this paper.. However, most of the previous transfer RL algorithms are similar to the transfer learning methods in deep learning and are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand how and when will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we rethink the transfer Rl problems in the RL perspective and propose to transfer the Q-function learned in the old task to the target Q-function in the Q-learning of the new task. We call this new transfer Q-learning method target transfer Q-Learning (abbrev. TTQL). The transfer process is controlled by the error condition which can help to avoid the harm to the new tasks brought by the transferred target. We design the error condition in TTQL as whether the Bellman error of the transferred target Q-function is less than the current Q-function. We show that TTQL with the error condition will achieve a faster convergence rate than Q-learning. Our experiments are consistent with our theoretical results and verify the effectiveness of our proposed target transfer Q-learning method.

Final Iteration Convergence Bound of Q-Learning: Switching System Approach

Final Iteration Convergence Bound of Q-Learning: Switching System Approach

Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach

Unified ODE Analysis of Smooth Q-Learning Algorithms

Finite-Time Analysis of Asynchronous Q-Learning Under Diminishing Step-Size From Control-Theoretic View

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

Convergence Analysis of an Incremental Approach to Online Inverse Reinforcement Learning

Finite-Time Analysis of Simultaneous Double Q-learning

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Gradient Q(σ, Λ): A Unified Algorithm with Function Approximation for Reinforcement Learning

Error Bound Analysis of Q-Function for Discounted Optimal Control Problems With Policy Iteration.

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Deep Q-Learning: Theoretical Insights from an Asymptotic Analysis

Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition

Analysis of Iterative Learning Control for a Class of Linear Discrete-Time Switched Systems

Suppressing Overestimation in Q-Learning through Adversarial Behaviors

Target Transfer Q-Learning and Its Convergence Analysis

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate

Convergent and Efficient Deep Q Network Algorithm