The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

Xingyu Xu,Yandi Shen,Yuejie Chi,Cong Ma
2023-11-06
Abstract:We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
Machine Learning,Signal Processing,Optimization and Control
What problem does this paper attempt to address?
The paper attempts to address the problem of developing an efficient and robust method to solve potentially ill-conditioned matrix recovery problems in over-parameterized low-rank matrix sensing problems. Specifically, the study proposes a preconditioned gradient descent method (ScaledGD(λ)) that can quickly converge to the true low-rank matrix from a small random initialization, even when the true rank is unknown. Moreover, this method can converge to an approximately minimax optimal error in the presence of measurement noise, and its iteration complexity is almost independent of the condition number and problem dimensions, making it highly suitable for solving large-scale and ill-conditioned problems. The main contributions of the paper are: 1. Proposing ScaledGD(λ), a preconditioned gradient descent method that can achieve fast global convergence in over-parameterized settings. 2. Demonstrating that ScaledGD(λ) can converge to an approximately minimax optimal error at the same rate in the presence of measurement noise. 3. Accelerating the optimization process through preconditioning while ensuring the generalization ability of the over-parameterized learning model. 4. Significantly speeding up the convergence rate for ill-conditioned problems in over-parameterized settings, accelerating convergence in both the initial and local stages without compromising generalization performance.