The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

Xingyu Xu,Yandi Shen,Yuejie Chi,Cong Ma

2023-11-06

Abstract:We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.

Machine Learning,Signal Processing,Optimization and Control

What problem does this paper attempt to address?

The paper attempts to address the problem of developing an efficient and robust method to solve potentially ill-conditioned matrix recovery problems in over-parameterized low-rank matrix sensing problems. Specifically, the study proposes a preconditioned gradient descent method (ScaledGD(λ)) that can quickly converge to the true low-rank matrix from a small random initialization, even when the true rank is unknown. Moreover, this method can converge to an approximately minimax optimal error in the presence of measurement noise, and its iteration complexity is almost independent of the condition number and problem dimensions, making it highly suitable for solving large-scale and ill-conditioned problems. The main contributions of the paper are: 1. Proposing ScaledGD(λ), a preconditioned gradient descent method that can achieve fast global convergence in over-parameterized settings. 2. Demonstrating that ScaledGD(λ) can converge to an approximately minimax optimal error at the same rate in the presence of measurement noise. 3. Accelerating the optimization process through preconditioning while ensuring the generalization ability of the over-parameterized learning model. 4. Significantly speeding up the convergence rate for ill-conditioned problems in over-parameterized settings, accelerating convergence in both the initial and local stages without compromising generalization performance.

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing

Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent

Convergence of Projected Subgradient Method with Sparse or Low-Rank Constraints

Beyond Procrustes: Balancing-Free Gradient Descent for Asymmetric Low-Rank Matrix Sensing

A Preconditioned Riemannian Gradient Descent Algorithm for Low-Rank Matrix Recovery

Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank

Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion

Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery

Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation

Geometric Analysis of Noisy Low-Rank Matrix Recovery in the Exact Parametrized and the Overparametrized Regimes

Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction

Asymmetric matrix sensing by gradient descent with small random initialization

On the analysis of optimization with fixed-rank matrices: a quotient geometric view

Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees

Gradient descent in matrix factorization: Understanding large initialization

Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization