Abstract:This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal <a class="link-external link-http" href="http://et.al" rel="external noopener nofollow">this http URL</a>), the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Confidence Bound driven Representation learning for RL), which significantly improves the sample complexity from $\widetilde{O}( A^9 d^7 / (\epsilon^{10} (1-\gamma)^{22}))$ for FLAMBE to $\widetilde{O}( A^2 d^4 / (\epsilon^2 (1-\gamma)^{5}) )$ with $d$ being the rank of the transition matrix (or dimension of the ground truth representation), $A$ being the number of actions, and $\gamma$ being the discounted factor. Notably, REP-UCB is simpler than FLAMBE, as it directly balances the interplay between representation learning, exploration, and exploitation, while FLAMBE is an explore-then-commit style approach and has to perform reward-free exploration step-by-step forward in time. For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition: our algorithm is able to compete against any policy as long as it is covered by the offline distribution.

The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure

Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

Representation Learning for Online and Offline RL in Low-rank MDPs

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

Inverse Reinforcement Learning with Multiple Ranked Experts

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

Low-Rank MDPs with Continuous Action Spaces

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Lipschitz Lifelong Reinforcement Learning

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

Provably Efficient CVaR RL in Low-rank MDPs

Robust Knowledge Transfer in Tiered Reinforcement Learning

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure

Decoupling Dynamics and Reward for Transfer Learning

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization