Abstract:Majorization-minimization (MM) is a family of optimization methods that iteratively reduce a loss by minimizing a locally-tight upper bound, called a majorizer. Traditionally, majorizers were derived by hand, and MM was only applicable to a small number of well-studied problems. We present optimizers that instead derive majorizers automatically, using a recent generalization of Taylor mode automatic differentiation. These universal MM optimizers can be applied to arbitrary problems and converge from any starting point, with no hyperparameter tuning.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of optimization algorithms when dealing with complex loss functions. Specifically, the traditional Majorization - Minimization (MM) optimization method requires manual derivation of the majorizer (i.e., local tight upper bound) of a specific loss function, which limits its application in complex problems (such as loss functions in deep learning). The paper proposes a method for automatically deriving majorizers. By using a variant of the recently developed Taylor - mode automatic differentiation technique, the MM optimization method can be applied to any problem and converge from any starting point without adjusting hyper - parameters.
### Main contributions of the paper
1. **Automated derivation of majorizers**: The paper proposes a general - purpose MM optimizer that can automatically derive majorizers for any loss function. This method utilizes the Taylor - mode automatic differentiation technique of interval arithmetic and can provide effective majorizers for complex loss functions without adding excessive computational burden.
2. **Wide applicability**: These general - purpose MM optimizers can be applied to any problem, from simple linear regression to complex multi - layer perceptron (MLP) models. This method not only extends the application range of the MM optimization method but also improves the efficiency and robustness of optimization.
3. **Theoretical guarantee**: The paper provides theoretical guarantees for these general - purpose MM optimizers, proving that they can monotonically reduce the value of the loss function during the optimization process and, under certain conditions, have a convergence speed similar to that of backtracking line search, but without the need for backtracking or adjusting hyper - parameters.
4. **Experimental verification**: Through a series of experiments, the paper demonstrates the superior performance of these general - purpose MM optimizers on various optimization problems. In particular, without the need for parameter tuning, they outperform existing optimizers (such as Adam, AdaGrad, gradient descent, and backtracking line search).
### Method overview
- **SafeRate algorithm**: Determine a safe learning rate by automatically deriving a one - dimensional upper bound to ensure that the value of the loss function is reduced in each iteration.
- **SafeCombination algorithm**: Determine the linear combination of multiple update directions by automatically deriving a multi - dimensional upper bound, also ensuring that the value of the loss function is reduced in each iteration.
### Experimental results
- **One - dimensional problems**: On several synthetic one - dimensional problems, the SafeRate algorithm outperforms the baseline optimizer after parameter tuning without the need for parameter tuning.
- **Multi - dimensional problems**: On more complex multi - dimensional problems, such as the training of multi - layer perceptrons, the SafeRate and SafeCombination algorithms also perform excellently and can reach a lower loss value within a fewer number of iterations.
In conclusion, this paper significantly extends the application range of the MM optimization method by introducing a method for automated majorizer derivation and provides an efficient and robust optimization solution.