Abstract:Stochastic Gradient (SG) is the defacto iterative technique to solve stochastic optimization (SO) problems with a smooth (non-convex) objective $f$ and a stochastic first-order oracle. SG's attractiveness is due in part to its simplicity of executing a single step along the negative subsampled gradient direction to update the incumbent iterate. In this paper, we question SG's choice of executing a single step as opposed to multiple steps between subsample updates. Our investigation leads naturally to generalizing SG into Retrospective Approximation (RA) where, during each iteration, a "deterministic solver" executes possibly multiple steps on a subsampled deterministic problem and stops when further solving is deemed unnecessary from the standpoint of statistical efficiency. RA thus rigorizes what is appealing for implementation -- during each iteration, "plug in" a solver, e.g., L-BFGS line search or Newton-CG, as is, and solve only to the extent necessary. We develop a complete theory using relative error of the observed gradients as the principal object, demonstrating that almost sure and $L_1$ consistency of RA are preserved under especially weak conditions when sample sizes are increased at appropriate rates. We also characterize the iteration and oracle complexity (for linear and sub-linear solvers) of RA, and identify a practical termination criterion leading to optimal complexity rates. To subsume non-convex $f$, we present a certain "random central limit theorem" that incorporates the effect of curvature across all first-order critical points, demonstrating that the asymptotic behavior is described by a certain mixture of normals. The message from our numerical experiments is that the ability of RA to incorporate existing second-order deterministic solvers in a strategic manner might be important from the standpoint of dispensing with hyper-parameter tuning.

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

A policy gradient approach for optimization of smooth risk measures

Safe Reinforcement Learning Using Finite-Horizon Gradient-based Estimation

Stochastic Cubic-Regularized Policy Gradient Method

Stochastic Variance-Reduced Policy Gradient

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Policy Gradient for Reinforcement Learning with General Utilities

Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift

On the Second-Order Convergence of Biased Policy Gradient Algorithms

The Reinforce Policy Gradient Algorithm Revisited

On the Linear Convergence of Natural Policy Gradient Algorithm

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation.

Risk-Sensitive Reinforcement Learning via Policy Gradient Search

Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis

A Retrospective Approximation Approach for Smooth Stochastic Optimization

Policy Gradient Method For Robust Reinforcement Learning

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space