The Power of Resets in Online Reinforcement Learning

Zakaria Mhammedi,Dylan J. Foster,Alexander Rakhlin
2024-04-26
Abstract:Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simulators through online reinforcement learning with {local simulator access} (or, local planning), an RL protocol where the agent is allowed to reset to previously observed states and follow their dynamics during training. We use local simulator access to unlock new statistical guarantees that were previously out of reach:
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: how to design reinforcement learning algorithms by using local simulator access to achieve sample - efficient online reinforcement learning in the case of having general function approximation. Specifically, the paper focuses on the following two aspects: 1. **Sample - efficient MDP learning**: - The paper shows that Markov decision processes (MDPs) with low coverability can perform sample - efficient online learning under the condition of only having the realizability of the optimal state - action value function (\(Q^\star\)). - This result is achieved through a new algorithm, SimGolf, which combines global optimism and local simulator access. 2. **Handling of the Exogenous Block MDP problem**: - As a direct application of the above results, the paper proves that the well - known Exogenous Block MDP (ExBMDP) problem is solvable under local simulator access. - The ExBMDP problem is a complex reinforcement learning setting in high - dimensional observation states, where the underlying dynamics of the system are low - dimensional but are affected by time - related exogenous noise. In addition, the paper also proposes a more efficient and practical algorithm, RVFS (Recursive Value Function Search), which achieves sample complexity guarantees under strengthened statistical assumptions and is applicable to Exogenous Block MDPs with weakly correlated exogenous noise. ### Summary of key contributions - **Sample - efficient MDP learning**: Through the SimGolf algorithm, sample - efficient MDP learning is achieved under the conditions of only requiring \(Q^\star\)-realizability and coverability. - **Solvability of Exogenous Block MDP**: It is proved for the first time that the ExBMDP problem is solvable under local simulator access. - **Computationally efficient algorithm**: The RVFS algorithm is proposed, which achieves sample complexity guarantees under strengthened statistical assumptions and is more computationally efficient. These contributions provide a theoretical basis for understanding the potential of local simulator access in large - scale, high - dimensional state spaces and new tools and methods for practical applications.