Linear Recursive Feature Machines provably recover low-rank matrices

Adityanarayanan Radhakrishnan,Mikhail Belkin,Dmitriy Drusvyatskiy
2024-01-09
Abstract:A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction - a process called feature learning. Recent work posited that the effects of feature learning can be elicited from a classical statistical estimator called the average gradient outer product (AGOP). The authors proposed Recursive Feature Machines (RFMs) as an algorithm that explicitly performs feature learning by alternating between (1) reweighting the feature vectors by the AGOP and (2) learning the prediction function in the transformed space. In this work, we develop the first theoretical guarantees for how RFM performs dimensionality reduction by focusing on the class of overparametrized problems arising in sparse linear regression and low-rank matrix recovery. Specifically, we show that RFM restricted to linear models (lin-RFM) generalizes the well-studied Iteratively Reweighted Least Squares (IRLS) algorithm. Our results shed light on the connection between feature learning in neural networks and classical sparse recovery algorithms. In addition, we provide an implementation of lin-RFM that scales to matrices with millions of missing entries. Our implementation is faster than the standard IRLS algorithm as it is SVD-free. It also outperforms deep linear networks for sparse linear regression and low-rank matrix completion.
Machine Learning
What problem does this paper attempt to address?
The paper mainly discusses how to solve the problem of low-rank matrix recovery through Recursive Feature Machines (RFM). RFM is an algorithm that can explicitly perform feature learning. It reweights feature vectors by averaging gradient outer product (AGOP) and learns prediction functions in the transformed space. The researchers found that limiting RFM on linear models (lin-RFM) can be seen as a generalization of Iterative Reweighted Least Squares (IRLS), and it outperforms deep linear networks in sparse linear regression and low-rank matrix completion tasks. The paper presents the first theoretical guarantee of lin-RFM, demonstrating how it performs dimensionality reduction in overparameterized sparse linear regression and low-rank matrix recovery problems. There is a close connection between lin-RFM and IRLS and deep linear networks. Lin-RFM can be seen as a variant of IRLS in certain cases, and for specific α values, lin-RFM is equivalent to a deep linear network with depth 1/(1-2α) and an implicit bias. In addition, the paper provides a lin-RFM version without SVD implementation, which is faster than IRLS when dealing with matrices with a large number of missing values, and it performs better in matrix completion tasks. These results reveal the connection between feature learning in neural networks and classical sparse recovery algorithms, and indicate that RFM can find sparse/low-rank solutions by explicitly utilizing the fixed point equation of the regularization objective, while deep learning algorithms often rely on gradient-based training methods with implicit bias.