Abstract:In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted $\eta^\pi$) attained by a given policy $\pi$. We use the certainty-equivalence method to construct our estimator $\hat\eta^\pi$, given a generative model is available. In this circumstance we need a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon^{2p}(1-\gamma)^{2p+2}}\right)$ to guarantee the $p$-Wasserstein metric between $\hat\eta^\pi$ and $\eta^\pi$ less than $\varepsilon$ with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon^{2}(1-\gamma)^{4}}\right)$ suffices to ensure the Kolmogorov metric and total variation metric between $\hat\eta^\pi$ and $\eta^\pi$ is below $\varepsilon$ with high probability. Furthermore, we investigate the asymptotic behavior of $\hat\eta^\pi$. We demonstrate that the ``empirical process'' $\sqrt{n}(\hat\eta^\pi-\eta^\pi)$ converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class $\ell^\infty(\mathcal{F}_{\text{W}})$, also in the space of bounded functionals on indicator function class $\ell^\infty(\mathcal{F}_{\text{KS}})$ and bounded measurable function class $\ell^\infty(\mathcal{F}_{\text{TV}})$ when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $\eta^\pi$.

Distributional Bellman Operators over Mean Embeddings

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

A Distributional Perspective on Reinforcement Learning

Foundations of Multivariate Distributional Reinforcement Learning

Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

On solutions of the distributional Bellman equation

Bayesian Distributional Policy Gradients

Off-Policy Reinforcement Learning with High Dimensional Reward

Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm

Exploration by Distributional Reinforcement Learning

Adversarial Learning of Distributional Reinforcement Learning.

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

Distributional Reinforcement Learning with Regularized Wasserstein Loss

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

Value-Distributional Model-Based Reinforcement Learning

Estimation and Inference in Distributional Reinforcement Learning

A Distributional Analogue to the Successor Representation

Interpreting Distributional Reinforcement Learning: A Regularization Perspective

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression