Abstract:A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction - a process called feature learning. Recent work posited that the effects of feature learning can be elicited from a classical statistical estimator called the average gradient outer product (AGOP). The authors proposed Recursive Feature Machines (RFMs) as an algorithm that explicitly performs feature learning by alternating between (1) reweighting the feature vectors by the AGOP and (2) learning the prediction function in the transformed space. In this work, we develop the first theoretical guarantees for how RFM performs dimensionality reduction by focusing on the class of overparametrized problems arising in sparse linear regression and low-rank matrix recovery. Specifically, we show that RFM restricted to linear models (lin-RFM) generalizes the well-studied Iteratively Reweighted Least Squares (IRLS) algorithm. Our results shed light on the connection between feature learning in neural networks and classical sparse recovery algorithms. In addition, we provide an implementation of lin-RFM that scales to matrices with millions of missing entries. Our implementation is faster than the standard IRLS algorithm as it is SVD-free. It also outperforms deep linear networks for sparse linear regression and low-rank matrix completion.

What problem does this paper attempt to address?

The paper mainly discusses how to solve the problem of low-rank matrix recovery through Recursive Feature Machines (RFM). RFM is an algorithm that can explicitly perform feature learning. It reweights feature vectors by averaging gradient outer product (AGOP) and learns prediction functions in the transformed space. The researchers found that limiting RFM on linear models (lin-RFM) can be seen as a generalization of Iterative Reweighted Least Squares (IRLS), and it outperforms deep linear networks in sparse linear regression and low-rank matrix completion tasks. The paper presents the first theoretical guarantee of lin-RFM, demonstrating how it performs dimensionality reduction in overparameterized sparse linear regression and low-rank matrix recovery problems. There is a close connection between lin-RFM and IRLS and deep linear networks. Lin-RFM can be seen as a variant of IRLS in certain cases, and for specific α values, lin-RFM is equivalent to a deep linear network with depth 1/(1-2α) and an implicit bias. In addition, the paper provides a lin-RFM version without SVD implementation, which is faster than IRLS when dealing with matrices with a large number of missing values, and it performs better in matrix completion tasks. These results reveal the connection between feature learning in neural networks and classical sparse recovery algorithms, and indicate that RFM can find sparse/low-rank solutions by explicitly utilizing the fixed point equation of the regularization objective, while deep learning algorithms often rely on gradient-based training methods with implicit bias.

Linear Recursive Feature Machines provably recover low-rank matrices

CONVERGENCE AND STABILITY OF ITERATIVELY REWEIGHTED LEAST SQUARES FOR LOW-RANK MATRIXRECOVERY

Low Rank Matrix Recovery with Adversarial Sparse Noise

Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks

On Feature Scaling of Recursive Feature Machines

Alternating Iteratively Reweighted Minimization Algorithms for Low-Rank Matrix Factorization

Low-rank matrix recovery with non-quadratic loss: projected gradient method and regularity projection oracle

LERE: Learning-Based Low-Rank Matrix Recovery with Rank Estimation

Convergence and stability of iteratively reweighted least squares for low-rank matrix recovery

Guarantees of Riemannian Optimization for Low Rank Matrix Recovery

Robust Regularized Low-Rank Matrix Models for Regression and Classification

Generalized Nonconvex Nonsmooth Low-Rank Matrix Recovery Framework with Feasible Algorithm Designs and Convergence Analysis.

RIP-based Performance Guarantee for Low Rank Matrix Recovery via $L_{*-F}$ Minimization

Leveraging subspace information for low-rank matrix reconstruction

Sine Activated Low-Rank Matrices for Parameter Efficient Learning

Efficient Low Rank Matrix Recovery With Flexible Group Sparse Regularization

Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Robust Low-Rank Matrix Factorization Via Block Iteratively Reweighted Least-Squares.

Matrix Recovery with Implicitly Low-Rank Data.

Logarithmic Norm Regularized Low-Rank Factorization for Matrix and Tensor Completion

Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning