Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

Philip S. Thomas,Emma Brunskill
DOI: https://doi.org/10.48550/arXiv.1706.06643
2017-06-21
Abstract:We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?