Sample Complexity Analysis for Adaptive Optimization Algorithms with Stochastic Oracles

Billy Jin,Katya Scheinberg,Miaolan Xie
2023-09-29
Abstract:Several classical adaptive optimization algorithms, such as line search and trust region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are estimated via stochastic oracles. Unlike the majority of stochastic methods, these methods do not use a pre-specified sequence of step size parameters, but adapt the step size parameter according to the estimated progress of the algorithm and use it to dictate the accuracy required from the stochastic approximations. The requirements on stochastic approximations are, thus, also adaptive and the oracle costs can vary from iteration to iteration. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero due to possible oracle failures, and bounds on the step size parameter have not been previously derived. This creates obstacles in the total complexity analysis of such methods, because the oracle costs are typically decreasing in the step size parameter, and could be arbitrarily large as the step size parameter goes to 0. Thus, until now only the total iteration complexity of these methods has been analyzed. In this paper, we derive a lower bound on the step size parameter that holds with high probability for a large class of adaptive stochastic methods. We then use this lower bound to derive a framework for analyzing the expected and high probability total oracle complexity of any method in this class. Finally, we apply this framework to analyze the total sample complexity of two particular algorithms, STORM and SASS, in the expected risk minimization problem.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to analyze the total sample complexity of adaptive optimization algorithms when using random estimators (such as stochastic gradient or Hessian estimators). Specifically, the paper focuses on the fact that in these adaptive methods, the step - size parameter may become very small due to the failure of the random estimator, which may cause the cost of each iteration (i.e., the number of samples required to obtain sufficient precision from the random estimator) to become arbitrarily large. This poses an obstacle to the total complexity analysis of these methods because the smaller the step - size parameter, the more samples are required and the higher the cost. ### Main problems 1. **Lower bound of the step - size parameter**: One of the core problems in the paper is to derive a high - probability lower bound of the step - size parameter in adaptive stochastic methods. This lower bound is crucial for ensuring that the total sample complexity of the algorithm is bounded, as it prevents the step - size parameter from approaching zero without limit, thereby avoiding an infinite increase in the cost of each step. 2. **Total sample complexity analysis**: Based on the lower bound of the step - size parameter, the paper further proposes a framework to analyze the total sample complexity of such methods in expectation and with high probability. This framework is not only applicable to general adaptive stochastic methods but also specifically applied to two specific algorithms - STORM (Stochastic Trust - region Optimization with Random Models) and SASS (Stochastic Adaptive Step Search), and its effectiveness has been verified in the expected risk minimization problem. ### Background and motivation - **Adaptive optimization algorithms**: Traditional adaptive optimization algorithms (such as line - search and trust - region methods) have been very successful in deterministic environments. These algorithms determine the direction and magnitude of the next move through local models and step - size parameters, and adjust the step - size parameter according to the improvement of the model. - **Challenges in random environments**: When these algorithms are extended to random environments, function values, gradients, and Hessian matrices are obtained through random estimators. In this case, the step - size parameter is no longer fixed but is adaptively adjusted according to the progress of the algorithm. However, due to the uncertainty of the random estimator, the step - size parameter may become very small, which may cause a sharp increase in the cost of each step. - **Limitations of existing research**: Most previous studies have only analyzed the total iteration complexity of these methods, without considering the actual cost of each step, i.e., the total sample complexity. This makes the overall efficiency evaluation of the algorithm incomplete. ### Main contributions of the paper 1. **High - probability lower bound of the step - size parameter**: The paper derives a high - probability lower bound of the step - size parameter by coupling stochastic processes and one - sided random walks. 2. **Total sample complexity analysis framework**: Based on the lower bound of the step - size parameter, the paper proposes a framework to analyze the total sample complexity of adaptive stochastic methods. 3. **Application to specific algorithms**: The paper applies this framework to the STORM and SASS algorithms, obtains their total sample complexity in the expected risk minimization problem, and shows that these complexities are basically consistent with the complexity lower bound of first - order algorithms. ### Conclusion By solving the problem of the step - size parameter approaching zero without limit, the paper provides a solid theoretical basis for the total sample complexity analysis of adaptive stochastic optimization algorithms. This not only helps to understand the performance of these algorithms in practical applications but also provides new tools and methods for future research.