Abstract:Several classical adaptive optimization algorithms, such as line search and trust region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are estimated via stochastic oracles. Unlike the majority of stochastic methods, these methods do not use a pre-specified sequence of step size parameters, but adapt the step size parameter according to the estimated progress of the algorithm and use it to dictate the accuracy required from the stochastic approximations. The requirements on stochastic approximations are, thus, also adaptive and the oracle costs can vary from iteration to iteration. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero due to possible oracle failures, and bounds on the step size parameter have not been previously derived. This creates obstacles in the total complexity analysis of such methods, because the oracle costs are typically decreasing in the step size parameter, and could be arbitrarily large as the step size parameter goes to 0. Thus, until now only the total iteration complexity of these methods has been analyzed. In this paper, we derive a lower bound on the step size parameter that holds with high probability for a large class of adaptive stochastic methods. We then use this lower bound to derive a framework for analyzing the expected and high probability total oracle complexity of any method in this class. Finally, we apply this framework to analyze the total sample complexity of two particular algorithms, STORM and SASS, in the expected risk minimization problem.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to analyze the total sample complexity of adaptive optimization algorithms when using random estimators (such as stochastic gradient or Hessian estimators). Specifically, the paper focuses on the fact that in these adaptive methods, the step - size parameter may become very small due to the failure of the random estimator, which may cause the cost of each iteration (i.e., the number of samples required to obtain sufficient precision from the random estimator) to become arbitrarily large. This poses an obstacle to the total complexity analysis of these methods because the smaller the step - size parameter, the more samples are required and the higher the cost. ### Main problems 1. **Lower bound of the step - size parameter**: One of the core problems in the paper is to derive a high - probability lower bound of the step - size parameter in adaptive stochastic methods. This lower bound is crucial for ensuring that the total sample complexity of the algorithm is bounded, as it prevents the step - size parameter from approaching zero without limit, thereby avoiding an infinite increase in the cost of each step. 2. **Total sample complexity analysis**: Based on the lower bound of the step - size parameter, the paper further proposes a framework to analyze the total sample complexity of such methods in expectation and with high probability. This framework is not only applicable to general adaptive stochastic methods but also specifically applied to two specific algorithms - STORM (Stochastic Trust - region Optimization with Random Models) and SASS (Stochastic Adaptive Step Search), and its effectiveness has been verified in the expected risk minimization problem. ### Background and motivation - **Adaptive optimization algorithms**: Traditional adaptive optimization algorithms (such as line - search and trust - region methods) have been very successful in deterministic environments. These algorithms determine the direction and magnitude of the next move through local models and step - size parameters, and adjust the step - size parameter according to the improvement of the model. - **Challenges in random environments**: When these algorithms are extended to random environments, function values, gradients, and Hessian matrices are obtained through random estimators. In this case, the step - size parameter is no longer fixed but is adaptively adjusted according to the progress of the algorithm. However, due to the uncertainty of the random estimator, the step - size parameter may become very small, which may cause a sharp increase in the cost of each step. - **Limitations of existing research**: Most previous studies have only analyzed the total iteration complexity of these methods, without considering the actual cost of each step, i.e., the total sample complexity. This makes the overall efficiency evaluation of the algorithm incomplete. ### Main contributions of the paper 1. **High - probability lower bound of the step - size parameter**: The paper derives a high - probability lower bound of the step - size parameter by coupling stochastic processes and one - sided random walks. 2. **Total sample complexity analysis framework**: Based on the lower bound of the step - size parameter, the paper proposes a framework to analyze the total sample complexity of adaptive stochastic methods. 3. **Application to specific algorithms**: The paper applies this framework to the STORM and SASS algorithms, obtains their total sample complexity in the expected risk minimization problem, and shows that these complexities are basically consistent with the complexity lower bound of first - order algorithms. ### Conclusion By solving the problem of the step - size parameter approaching zero without limit, the paper provides a solid theoretical basis for the total sample complexity analysis of adaptive stochastic optimization algorithms. This not only helps to understand the performance of these algorithms in practical applications but also provides new tools and methods for future research.

Sample Complexity Analysis for Adaptive Optimization Algorithms with Stochastic Oracles

High Probability Complexity Bounds for Adaptive Step Search Based on Stochastic Oracles

Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity.

An Accelerated Decentralized Stochastic Optimization Algorithm with Inexact Model

Stochastic Optimization Algorithms for Problems with Controllable Biased Oracles

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

An Adaptive Sampling Augmented Lagrangian Method for Stochastic Optimization with Deterministic Constraints

High-Probability Complexity Bounds for Non-smooth Stochastic Convex Optimization with Heavy-Tailed Noise

Central Limit Theorems of a Recursive Stochastic Algorithm with Applications to Adaptive Designs

The Minimax Complexity of Distributed Optimization

Distributed Variable Sample-size Stochastic Optimization with Fixed Step-sizes

Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

Derivative-Free Optimization via Adaptive Sampling Strategies

Asynchronous Variance-Reduced Block Schemes for Composite Non-Convex Stochastic Optimization: Block-Specific Steplengths and Adapted Batch-Sizes.

On the Oracle Complexity of Higher-Order Smooth Non-Convex Finite-Sum Optimization

First- and Second-Order High Probability Complexity Bounds for Trust-Region Methods with Noisy Oracles

Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model

Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles

Acceleration Exists! Optimization Problems When Oracle Can Only Compare Objective Function Values

Optimal inexactness schedules for Tunable Oracle based Methods