Abstract:This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal <a class="link-external link-http" href="http://et.al" rel="external noopener nofollow">this http URL</a>), the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Confidence Bound driven Representation learning for RL), which significantly improves the sample complexity from $\widetilde{O}( A^9 d^7 / (\epsilon^{10} (1-\gamma)^{22}))$ for FLAMBE to $\widetilde{O}( A^2 d^4 / (\epsilon^2 (1-\gamma)^{5}) )$ with $d$ being the rank of the transition matrix (or dimension of the ground truth representation), $A$ being the number of actions, and $\gamma$ being the discounted factor. Notably, REP-UCB is simpler than FLAMBE, as it directly balances the interplay between representation learning, exploration, and exploitation, while FLAMBE is an explore-then-commit style approach and has to perform reward-free exploration step-by-step forward in time. For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition: our algorithm is able to compete against any policy as long as it is covered by the offline distribution.

On the Power of Multitask Representation Learning in Linear MDP

Provable Benefit of Multitask Representation Learning in Reinforcement Learning

Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes

The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback

Provable General Function Class Representation Learning in Multitask Bandits and MDPs

Offline Multitask Representation Learning for Reinforcement Learning

Learning Functions to Study the Benefit of Multitask Learning

Improved Active Multi-Task Representation Learning via Lasso

Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control

Multi-task Batch Reinforcement Learning with Metric Learning

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

Near-Optimal Representation Learning For Linear Bandits And Linear Rl

AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts

Representation Learning for Online and Offline RL in Low-rank MDPs

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning

Multi-Task Imitation Learning for Linear Dynamical Systems