A multilevel stochastic regularized first-order method with application to training

Filippo Marini,Margherita Porcelli,Elisa Riccietti
2024-12-16
Abstract:In this paper, we propose a new multilevel stochastic framework for the solution of optimization problems. The proposed approach uses random regularized first-order models that exploit an available hierarchical description of the problem, being either in the classical variable space or in the function space, meaning that different levels of accuracy for the objective function are available. The converge analysis of the method is conducted and its numerical behavior is tested on the solution of finite-sum minimization problems. Indeed, the multilevel framework is tailored to the solution of such problems resulting in fact in a nontrivial variance reduction technique with adaptive step-size that outperforms standard approaches when solving nonconvex problems. Differently from classical deterministic multilevel methods, our stochastic method does not require the finest approximation to coincide with the original objective function. This allows to avoid the evaluation of the full sum in finite-sum minimization problems, opening at the solution of classification problems with large data sets.
Optimization and Control,Numerical Analysis
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the solution of large - scale stochastic optimization problems, especially when the objective function value can only be calculated in a noisy way. Specifically, the paper proposes a new multilevel stochastic regularized first - order method for solving optimization problems. This method utilizes the hierarchical description of the problem (either in the classical variable space or in the function space), thus providing different precision representations of the objective function at different levels. ### Main problems 1. **Limitations of existing methods**: - Traditional multilevel methods are only applicable to deterministic contexts and cannot handle stochastic optimization problems. - Existing multilevel methods usually rely on the hierarchical structure in the variable space, such as choosing a specific grid when discretizing infinite - dimensional problems, while in modern applications, it is more common that the accuracy of function estimation becomes a limiting factor rather than the size of the model. 2. **Classification problems under large - data sets**: - When dealing with large - data sets, traditional methods need to evaluate the full sum, which is very time - consuming and infeasible in practical applications. ### Solutions The paper proposes a multilevel method extended to the stochastic environment, allowing the construction of a hierarchical structure in the "function space", that is, using function approximations with different precisions. This method can not only construct a hierarchical structure in the variable space but also construct a hierarchical structure of function approximations by reducing noise. ### Key contributions 1. **First extension of the multilevel method to the stochastic framework**: Overcomes the limitation that existing methods are limited to deterministic cases. 2. **Allows the construction of a hierarchical structure in the function space**: Considers function approximations with different precisions. 3. **Solves the theoretical convergence problem of the classical deterministic multilevel method**: Does not require that the objective function at the finest level be consistent with the original objective function, making the method applicable to problems of excessive scale. 4. **Proposes a variance reduction technique for the finite - sum minimization problem**: Has a selection mechanism with adaptive step sizes and outperforms mini - batch SVRG on non - convex problems. 5. **Provides the first stochastic analysis of the first - order adaptive regularization method**: Covers the classical single - layer case. ### Application background This method is particularly suitable for classification problems on large - scale data sets, such as the training problems common in deep learning. By reducing the variance and adaptively selecting the step size, this method can significantly improve the solution speed while maintaining high precision. ### Mathematical formulas - The finite - sum form of the objective function: \[ \min_{x\in\mathbb{R}^n}\frac{1}{N}\sum_{i = 1}^{N}f_i(x) \] where \(f_i:\mathbb{R}^n\rightarrow\mathbb{R}\) are smooth and bounded - below functions. - The form of the regularization model: \[ m_{R,\ell}^k(s)=m_\ell^k(s)+\lambda_\ell^k\|\nabla_x f_\ell(x_\ell^k)\|^2\|s\|^2 \] Through these improvements, this method performs well in handling large - scale stochastic optimization problems, especially significantly outperforming the existing mini - batch SVRG method on non - convex problems.