Majorization-minimization Bregman proximal gradient algorithms for nonnegative matrix factorization with the Kullback--Leibler divergence

Shota Takahashi,Mirai Tanaka,Shiro Ikeda
2024-08-01
Abstract:Nonnegative matrix factorization (NMF) is a popular method in machine learning and signal processing to decompose a given nonnegative matrix into two nonnegative matrices. In this paper, to solve NMF, we propose new algorithms, called majorization-minimization Bregman proximal gradient algorithm (MMBPG) and MMBPG with extrapolation (MMBPGe). MMBPG and MMBPGe minimize an auxiliary function majorizing the Kullback--Leibler (KL) divergence loss by the existing Bregman proximal gradient algorithms. While existing KL-based NMF methods update each variable alternately, proposed algorithms update all variables simultaneously. The proposed MMBPG and MMBPGe are equipped with a separable Bregman distance that satisfies the smooth adaptable property and that makes its subproblem solvable in closed forms. We also proved that even though these algorithms are designed to minimize an auxiliary function, MMBPG and MMBPGe monotonically decrease the objective function and a potential function, respectively. Using this fact, we establish that a sequence generated by MMBPG(e) globally converges to a Karush--Kuhn--Tucker (KKT) point. In numerical experiments, we compared proposed algorithms with existing algorithms on synthetic data.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the optimization problem in Nonnegative Matrix Factorization (NMF), especially when using the Kullback - Leibler (KL) divergence as the loss function. Specifically, the paper proposes a new algorithm - the Majorization - Minimization Bregman Proximal Gradient (MMBPG) and its version with extrapolation (MMBPGe) to solve the KL - divergence - based NMF problem. ### Background and Motivation - **Nonnegative Matrix Factorization (NMF)**: Given a nonnegative matrix \(X\in\mathbb{R}^{m\times n}_{+}\), the goal of NMF is to find two nonnegative matrices \(W\in\mathbb{R}^{m\times r}_{+}\) and \(H\in\mathbb{R}^{r\times n}_{+}\) such that \(X\approx WH\). - **KL Divergence**: In NMF, one of the commonly used loss functions is the Kullback - Leibler divergence, which is defined as: \[ D(X, WH)=\sum_{i = 1}^{m}\sum_{j = 1}^{n}\left(X_{ij}\log\frac{X_{ij}}{(WH)_{ij}}-X_{ij}+(WH)_{ij}\right) \] - **Limitations of Existing Methods**: Existing KL - NMF methods usually update \(W\) and \(H\) alternately, which may lead to convergence to a non - global optimal solution, especially when the objective function contains non - smooth regularization terms. ### Main Contributions of the Paper 1. **New Algorithm**: The paper proposes a new algorithm framework - MMBPG and MMBBPGe. These algorithms solve the KL - NMF problem by updating \(W\) and \(H\) simultaneously. 2. **Auxiliary Function**: MMBPG and MMBBPGe indirectly minimize the objective function by minimizing an auxiliary function, which is an upper bound of the original loss function. 3. **Smooth Adaptability**: The algorithms utilize the smooth adaptable property, which ensures the global convergence of the algorithm even when the condition of global Lipschitz continuity is not satisfied. 4. **Closed - form Solution**: For specific regularization terms (such as \(\ell_1\) regularization and squared Frobenius norm regularization), the paper provides closed - form solutions to sub - problems, thus simplifying the calculation process. 5. **Theoretical Guarantee**: The paper proves that MMBPG and MMBBPGe can monotonically reduce the value of the objective function and can globally converge to the Karush - Kuhn - Tucker (KKT) point from any initial point. ### Numerical Experiments - **Synthetic Data**: The paper compares the performance of the newly proposed algorithms with existing algorithms on synthetic data through numerical experiments, verifying the effectiveness and superiority of the new algorithms. ### Conclusion The new algorithms MMBPG and MMBBPGe proposed in the paper not only have theoretical advantages but also show good performance in practical applications when solving the KL - divergence - based NMF problem. These algorithms overcome the limitations of existing methods by updating \(W\) and \(H\) simultaneously, providing a new solution to the NMF problem.