On Representation Complexity of Model-based and Model-free Reinforcement Learning

Hanlin Zhu,Baihe Huang,Stuart Russell

2024-03-11

Abstract:We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity. We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with polynomial size, while the optimal $Q$-function suffers an exponential circuit complexity in constant-depth circuits. By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective: in some cases, the ground-truth rule (model) of the environment is simple to represent, while other quantities, such as $Q$-function, appear complex. We empirically corroborate our theory by comparing the approximation error of the transition kernel, reward function, and optimal $Q$-function in various Mujoco environments, which demonstrates that the approximation errors of the transition kernel and reward function are consistently lower than those of the optimal $Q$-function. To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.

Machine Learning

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper primarily investigates the representation complexity in the context of circuit complexity for model-based and model-free Reinforcement Learning (RL). Specifically: 1. **Theoretical Contributions**: - The study shows that there exists a class of Markov Decision Processes (MDPs) where the underlying transition and reward functions can be represented by circuits of constant depth and polynomial size, while the optimal Q-function requires circuits of exponential complexity to represent. - This difference in representation complexity explains why model-based algorithms typically have better sample complexity compared to model-free algorithms. 2. **Experimental Validation**: - By comparing the approximation errors of the transition kernel, reward function, and optimal Q-function in various MuJoCo environments, the theoretical results were empirically validated. 3. **Main Findings**: - In a wide range of MDPs, the representation complexity of the transition and reward functions is significantly lower than that of the optimal Q-function, leading to the superior sample efficiency of model-based algorithms over model-free algorithms. In summary, this paper introduces the concept of circuit complexity to provide a novel perspective on the sample efficiency gap between model-based and model-free algorithms, offering a rigorous framework for future RL research.

On Representation Complexity of Model-based and Model-free Reinforcement Learning

Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Information Theoretic Model Predictive Q-Learning

Unified Algorithms for RL with Decision-Estimation Coefficients: PAC, Reward-Free, Preference-Based Learning, and Beyond

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

Bilinear Convolution Decomposition for Causal RL Interpretability

Model-Free Quantum Control with Reinforcement Learning

Cost Explosion for Efficient Reinforcement Learning Optimisation of Quantum Circuits

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

On Reward-Free Reinforcement Learning with Linear Function Approximation