Abstract:Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and promising for boosting sample efficiency in RL. These methods usually rely on contrastive learning and data augmentation to train a transition model, which is different from how the model is used in RL---performing value-based planning. Accordingly, the learned representation by these visual methods may be good for recognition but not optimal for estimating state value and solving the decision problem. To address this issue, we propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. More specifically, VCR trains a model to predict the future state (also referred to as the "imagined state'') based on the current one and a sequence of actions. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a Q value head on both of the states and obtains two distributions of action values. Then a distance is computed and minimized to force the imagined state to produce a similar action value prediction as that by the real state. We develop two implementations of the above idea for the discrete and continuous action spaces respectively. We conduct experiments on Atari 100k and DeepMind Control Suite benchmarks to validate their effectiveness for improving sample efficiency. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.

Uncertainty-Aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Controlling Estimation Error in Reinforcement Learning via Reinforced Operation

Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Provably Efficient CVaR RL in Low-rank MDPs

MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

$\Sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure

Towards Safe Reinforcement Learning Via Constraining Conditional Value-at-Risk

Value-Distributional Model-Based Reinforcement Learning

Model predictive control-based value estimation for efficient reinforcement learning

√N-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank.

Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

An Information-Theoretic Optimality Principle for Deep Reinforcement Learning