Fundamental limits of Non-Linear Low-Rank Matrix Estimation

Pierre Mergny,Justin Ko,Florent Krzakala,Lenka Zdeborová
2024-03-07
Abstract:We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as $N^{\frac 12 (1-1/k_F)}$, where $k_F$ is the first non-zero Fisher information coefficient of the function. We provide asymptotic characterization for the minimal achievable mean squared error (MMSE) and an approximate message-passing algorithm that reaches the MMSE under conditions analogous to the linear version of the problem. We also provide asymptotic errors achieved by methods such as principal component analysis combined with Bayesian denoising, and compare them with Bayes-optimal MMSE.
Machine Learning
What problem does this paper attempt to address?
This paper mainly discusses the problem of nonlinear low-rank matrix estimation, especially when the Fisher information of the probability channel is zero. The researchers prove a powerful universality result, indicating that after appropriate scaling of the signal, there exists an explicitly entry transformation determined by the structure of the nonlinear function, which transforms the observed data into an equivalent Gaussian model. The key point is that in order to accurately reconstruct the signal, the ratio of signal to noise needs to grow at a rate of (1-1/kF) with the square root of N, where kF is the first non-zero Fisher information coefficient after function expansion. The paper proposes an exact characterization of asymptotic minimum mean square error (MMSE) and provides an approximate message passing algorithm that achieves MMSE under conditions similar to linear problems. In addition, the performance of principal component analysis (PCA) and other methods after combining Bayesian denoising is studied and compared with optimal Bayes MMSE. In terms of information theory, the paper proves that the mutual information and free entropy of nonlinear low-rank matrix estimation correspond to the Gaussian mutation model, and that MMSE is also universal and can be explicitly expressed in terms of MMSE of the Gaussian equivalent model. This extends the previous limited universality for kF=1 and provides a rigorous mapping for the original Wigner mutation model. From the perspective of algorithms, the standard approximate message passing algorithm applied to the Fisher matrix can achieve MMSE under the condition of knowing the prior distribution of the target vector. The paper also demonstrates that the Fisher matrix is an optimization method for PCA and proves, through the universality theorem of spectral methods, that the optimal denoising of the top eigenvectors of the Fisher matrix approaches the performance of AMP. In summary, this paper provides a theoretical framework for the problem of nonlinear low-rank matrix estimation, establishes boundaries for information theory and algorithm performance, and provides practical methods for applications.