Abstract:Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simulators through online reinforcement learning with {local simulator access} (or, local planning), an RL protocol where the agent is allowed to reset to previously observed states and follow their dynamics during training. We use local simulator access to unlock new statistical guarantees that were previously out of reach:

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: how to design reinforcement learning algorithms by using local simulator access to achieve sample - efficient online reinforcement learning in the case of having general function approximation. Specifically, the paper focuses on the following two aspects: 1. **Sample - efficient MDP learning**: - The paper shows that Markov decision processes (MDPs) with low coverability can perform sample - efficient online learning under the condition of only having the realizability of the optimal state - action value function (\(Q^\star\)). - This result is achieved through a new algorithm, SimGolf, which combines global optimism and local simulator access. 2. **Handling of the Exogenous Block MDP problem**: - As a direct application of the above results, the paper proves that the well - known Exogenous Block MDP (ExBMDP) problem is solvable under local simulator access. - The ExBMDP problem is a complex reinforcement learning setting in high - dimensional observation states, where the underlying dynamics of the system are low - dimensional but are affected by time - related exogenous noise. In addition, the paper also proposes a more efficient and practical algorithm, RVFS (Recursive Value Function Search), which achieves sample complexity guarantees under strengthened statistical assumptions and is applicable to Exogenous Block MDPs with weakly correlated exogenous noise. ### Summary of key contributions - **Sample - efficient MDP learning**: Through the SimGolf algorithm, sample - efficient MDP learning is achieved under the conditions of only requiring \(Q^\star\)-realizability and coverability. - **Solvability of Exogenous Block MDP**: It is proved for the first time that the ExBMDP problem is solvable under local simulator access. - **Computationally efficient algorithm**: The RVFS algorithm is proposed, which achieves sample complexity guarantees under strengthened statistical assumptions and is more computationally efficient. These contributions provide a theoretical basis for understanding the potential of local simulator access in large - scale, high - dimensional state spaces and new tools and methods for practical applications.

The Power of Resets in Online Reinforcement Learning

Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems

Sample Efficient Deep Reinforcement Learning via Local Planning

PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

Hybrid Reinforcement Learning from Offline Observation Alone

Efficient Online Reinforcement Learning with Offline Data

Intelligent Switching for Reset-Free RL

When Learning Is Out of Reach, Reset: Generalization in Autonomous Visuomotor Reinforcement Learning

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Bypassing the Simulation-to-reality Gap: Online Reinforcement Learning using a Supervisor

Online Reinforcement Learning in Non-Stationary Context-Driven Environments

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage

Improving Offline Reinforcement Learning with Inaccurate Simulators

COSBO: Conservative Offline Simulation-Based Policy Optimization

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Reinforcement Learning for Resilient Power Grids

Online and Offline Reinforcement Learning by Planning with a Learned Model

Reinforcement Learning in Agent-Based Market Simulation: Unveiling Realistic Stylized Facts and Behavior

Efficient and Stable Offline-to-online Reinforcement Learning Via Continual Policy Revitalization