Stochastic Variance-Reduced Majorization-Minimization Algorithms

Duy-Nhat Phan,Sedi Bartz,Nilabja Guha,Hung M. Phan

2023-05-11

Abstract:We study a class of nonconvex nonsmooth optimization problems in which the objective is a sum of two functions: One function is the average of a large number of differentiable functions, while the other function is proper, lower semicontinuous and has a surrogate function that satisfies standard assumptions. Such problems arise in machine learning and regularized empirical risk minimization applications. However, nonconvexity and the large-sum structure are challenging for the design of new algorithms. Consequently, effective algorithms for such scenarios are scarce. We introduce and study three stochastic variance-reduced majorization-minimization (MM) algorithms, combining the general MM principle with new variance-reduced techniques. We provide almost surely subsequential convergence of the generated sequence to a stationary point. We further show that our algorithms possess the best-known complexity bounds in terms of gradient evaluations. We demonstrate the effectiveness of our algorithms on sparse binary classification problems, sparse multi-class logistic regressions, and neural networks by employing several widely-used and publicly available data sets.

Optimization and Control,Numerical Analysis

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily addresses a class of non-convex and non-smooth optimization problems, where the objective function consists of two parts: one part is the average of a large number of differentiable functions, and the other part is a suitable, lower semi-continuous function. These types of problems are very common in machine learning and regularized empirical risk minimization applications. However, due to the presence of non-convexity and the large-scale summation structure, designing new algorithms is very challenging. To address this issue, the authors propose three Majorization-Minimization (MM) algorithms based on stochastic variance reduction: 1. **MM-SAGA**: Combines the deterministic MM method with SAGA-style stochastic gradient updates. 2. **MM-SVRG**: Combines the deterministic MM method with loop-free SVRG stochastic gradient estimation. 3. **MM-SARAH**: Combines the deterministic MM method with loop-free SARAH stochastic gradient estimation. These algorithms not only provide almost certain subsequence convergence to a stationary point but also demonstrate the best-known complexity bounds in terms of gradient evaluations. Experimental results show that these algorithms perform excellently in tasks such as sparse binary classification, multi-class logistic regression, and neural network training.

Stochastic Variance-Reduced Majorization-Minimization Algorithms

Accelerated Stochastic ADMM with Variance Reduction

Stochastic Sub-Sampled Newton Method with Variance Reduction

Convergence analysis of stochastic higher-order majorization-minimization algorithms

Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

The appeals of quadratic majorization–minimization

Generalized Majorization-Minimization for Non-Convex Optimization.

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

Nonconvex Optimization via MM Algorithms: Convergence Theory

Variance reduction techniques for stochastic proximal point algorithms

Universal Majorization-Minimization Algorithms

On the Global Convergence of Majorization Minimization Algorithms for Nonconvex Optimization Problems.

Relaxed Majorization-Minimization for Non-Smooth and Non-Convex Optimization

An Introduction to MM Algorithms for Machine Learning and Statistical

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Stochastic Nested Variance Reduction for Nonconvex Optimization

Distance Majorization and Its Applications

Single-Loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions

Stochastic average model methods

A Semismooth Newton Stochastic Proximal Point Algorithm with Variance Reduction