Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Licong Lin,Yu Bai,Song Mei
2024-05-26
Abstract:Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.
Machine Learning,Artificial Intelligence,Computation and Language,Statistics Theory
What problem does this paper attempt to address?
This paper discusses how to use pre-trained Transformer models for unsupervised in-context reinforcement learning (ICRL). Currently, although Transformer has demonstrated powerful capabilities in unsupervised reinforcement learning, its theoretical understanding is still limited, including what reinforcement learning algorithms it can perform and how it can improve strategies based on past experience in new environments. The paper proposes a theoretical framework that analyzes methods for achieving ICRL through supervised pre-training, including algorithm distillation and decision-based pre-training of Transformers. The main contributions are as follows: 1. Proposing a general supervised pre-training method for meta reinforcement learning, covering existing techniques such as algorithm distillation and decision-based pre-training of Transformers. 2. Demonstrating that pre-trained Transformers can mimic the conditional expectation of expert algorithms given trajectories, with the generalization error related to model capacity and the distribution difference between expert and offline algorithms. 3. Showing that Transformers can effectively approximate several near-optimal reinforcement learning algorithms, such as LinUCB and Thompson sampling for stochastic linear bandit problems, as well as UCB-VI for tabular Markov decision processes. 4. Providing sample complexity guarantees for pre-trained Transformers and corresponding regret bounds. 5. Conducting preliminary experiments to validate the performance of Transformers in the specified ICRL setting. The paper also discusses the applications of Transformers in decision-making, unsupervised pre-training, and online learning, as well as the expressive power of Transformers and the statistical theory of imitation learning. By analyzing the Transformer architecture, the authors demonstrate how to use Transformers to implement accelerated gradient descent and matrix square root algorithms, providing new insights into the expressive power of Transformers. In summary, the paper addresses the issue of how Transformers can perform unsupervised reinforcement learning through supervised pre-training, providing theoretical guarantees and revealing the potential capabilities of Transformers in reinforcement learning.