Abstract:Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the connections between the experience prioritization and occupancy optimization. By using a regularized RL objective with $f-$divergence regularizer and employing its dual form, we show that an optimal solution to the objective is obtained by shifting the distribution of off-policy data in the replay buffer towards the on-policy optimal distribution using TD-error-based occupancy ratios. Our derivation results in a new pipeline of TD error prioritization. We specifically explore the KL divergence as the regularizer and obtain a new form of prioritization scheme, the regularized optimal experience replay (ROER). We evaluate the proposed prioritization scheme with the Soft Actor-Critic (SAC) algorithm in continuous control MuJoCo and DM Control benchmark tasks where our proposed scheme outperforms baselines in 6 out of 11 tasks while the results of the rest match with or do not deviate far from the baselines. Further, using pretraining, ROER achieves noticeable improvement on difficult Antmaze environment where baselines fail, showing applicability to offline-to-online fine-tuning. Code is available at \url{<a class="link-external link-https" href="https://github.com/XavierChanglingLi/Regularized-Optimal-Experience-Replay" rel="external noopener nofollow">this https URL</a>}.

Curiosity-tuned experience replay for wargaming decision modeling without reward-engineering

ACDER: Augmented Curiosity-Driven Experience Replay

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Random curiosity-driven exploration in deep reinforcement learning

CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

ROER: Regularized Optimal Experience Replay

Experience Selection In Multi-Agent Deep Reinforcement Learning

Episodic Reinforcement Learning with Expanded State-reward Space

CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

Revisiting Prioritized Experience Replay: A Value Perspective

Large-Scale Study of Curiosity-Driven Learning

Re-attentive experience replay in off-policy reinforcement learning

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Replay across Experiments: A Natural Extension of Off-Policy RL

Prioritized Generative Replay