Settling the Sample Complexity of Online Reinforcement Learning

Zihan Zhang,Yuxin Chen,Jason D. Lee,Simon S. Du
2024-05-24
Abstract:A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a ``large-sample'' regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory.
Machine Learning
What problem does this paper attempt to address?