Abstract:This paper revisits the estimation bias control problem of Q-learning, motivated by the fact that the estimation bias is not always evil, i.e., some environments benefit from overestimation bias or underestimation bias, while others suffer from these biases. Different from previous coarse-grained bias control methods, this paper proposes a fine-grained bias control algorithm called Order Q-learning. It uses the order statistic of multiple independent Q-tables to control bias and flexibly meet the personalized bias needs of different environments, i.e., the bias can vary from underestimation bias to overestimation bias as one selects a higher order Q-value. We derive the expected estimation bias and its lower bound and upper bound. They reveal that the expected estimation bias is inversely proportional to the number of Q-tables and proportional to the index of order statistic function. To show the versatility of Order Q-learning, we design an adaptive parameter adjustment strategy, leading to AdaOrder (Adaptive Order) Q-learning. It adaptively selects the number of Q-tables and the index of order statistic function via the number of visits to state-action pair and the average Q-value. We extend Order Q-learning and AdaOrder Q-learning to the large scale setting with function approximation, leading to Order DQN and AdaOrder DQN, respectively. Finally, we consider two experiment settings: deep reinforcement learning experiments show that our method outperforms several SOTA baselines drastically; tabular MDP experiments reveal fundamental insights into why our method can achieve superior performance.Our supplementary file can be found in https://1drv.ms/f/s!Atddp1iaDmL2gjv31CaGquw5WwYI.

Deep Reinforcement Learning for Adaptive Learning Systems

Adaptive Learning Recommendation Strategy Based on Deep Q-learning

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

Learning-Based Neural Dynamic Surface Predictive Control for MMC

Deep Active Learning with Adaptive Acquisition

Adaptive Deep Reinforcement Learning for Non-Stationary Environments

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Recommendation System for Adaptive Learning

Dynamic Weight Adjusting Deep Q-Networks for Real-Time Environmental Adaptation

Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points

Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning

Deep Reinforcement Learning for Adaptive Mesh Refinement

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a Survey

Deep Reinforcement Learning in Finite-Horizon to Explore the Most Probable Transition Pathway

Simulation of E-learning in English personalized learning recommendation system based on Markov chain algorithm and adaptive learning algorithm

Adaptive Order Q-learning

A Deep Reinforcement Learning Approach to Asset-Liability Management

Unbiased Deep Reinforcement Learning: A General Training Framework for Existing and Future Algorithms

Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks