Stochastic Variance-Reduced Majorization-Minimization Algorithms

Duy-Nhat Phan,Sedi Bartz,Nilabja Guha,Hung M. Phan
2023-05-11
Abstract:We study a class of nonconvex nonsmooth optimization problems in which the objective is a sum of two functions: One function is the average of a large number of differentiable functions, while the other function is proper, lower semicontinuous and has a surrogate function that satisfies standard assumptions. Such problems arise in machine learning and regularized empirical risk minimization applications. However, nonconvexity and the large-sum structure are challenging for the design of new algorithms. Consequently, effective algorithms for such scenarios are scarce. We introduce and study three stochastic variance-reduced majorization-minimization (MM) algorithms, combining the general MM principle with new variance-reduced techniques. We provide almost surely subsequential convergence of the generated sequence to a stationary point. We further show that our algorithms possess the best-known complexity bounds in terms of gradient evaluations. We demonstrate the effectiveness of our algorithms on sparse binary classification problems, sparse multi-class logistic regressions, and neural networks by employing several widely-used and publicly available data sets.
Optimization and Control,Numerical Analysis
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses a class of non-convex and non-smooth optimization problems, where the objective function consists of two parts: one part is the average of a large number of differentiable functions, and the other part is a suitable, lower semi-continuous function. These types of problems are very common in machine learning and regularized empirical risk minimization applications. However, due to the presence of non-convexity and the large-scale summation structure, designing new algorithms is very challenging. To address this issue, the authors propose three Majorization-Minimization (MM) algorithms based on stochastic variance reduction: 1. **MM-SAGA**: Combines the deterministic MM method with SAGA-style stochastic gradient updates. 2. **MM-SVRG**: Combines the deterministic MM method with loop-free SVRG stochastic gradient estimation. 3. **MM-SARAH**: Combines the deterministic MM method with loop-free SARAH stochastic gradient estimation. These algorithms not only provide almost certain subsequence convergence to a stationary point but also demonstrate the best-known complexity bounds in terms of gradient evaluations. Experimental results show that these algorithms perform excellently in tasks such as sparse binary classification, multi-class logistic regression, and neural network training.