Abstract:We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $\Theta(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms. We give evidence that efficient algorithms for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce. Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily investigates the computational-statistical gaps in sparse linear regression. Specifically, given samples drawn from a high-dimensional sparse linear model, the paper explores the minimum sample complexity required to find a dense estimator within polynomial time to achieve non-trivial prediction error. #### Research Background: - **Sparse Linear Regression Model**: Given a set of samples $(\mathbf{x}_i, y_i)$ drawn from a high-dimensional sparse linear model where $y_i = \langle \mathbf{x}_i, \boldsymbol{\beta}^* \rangle + \epsilon_i$, the goal is to find an estimator $\hat{\boldsymbol{\beta}}$ to approximate the unknown sparse vector $\boldsymbol{\beta}^*$. - **Computational-Statistical Gaps**: Information-theoretically, the number of samples required to achieve a certain prediction error is $\Theta(k \log(p/k))$; however, without additional model constraints, no known polynomial-time algorithm can achieve the same performance with fewer than $\Theta(p)$ samples. #### Main Contributions: - By constructing a reduction from sparse principal component analysis (sparse PCA) to sparse linear regression, the paper proves that even in improper settings (i.e., the output can be dense), any polynomial-time algorithm requires at least $\Omega(k^2)$ samples. - The authors demonstrate the connection between the sparse linear regression problem and the negative-spiked sparse PCA problem, and use this connection to prove the computational difficulty of sparse linear regression. #### Method Overview: - By constructing a special instance of sparse PCA where the noise variance is unknown, the authors demonstrate the difficulty of the sparse linear regression problem. - In the case where the noise variance is known, the paper also proves the difficulty of the sparse linear regression problem and further substantiates this by constructing a symmetric distinguishing problem. In summary, this paper reveals the complex relationship between computation and statistics in sparse models through the study of the sparse linear regression problem and provides theoretical lower bounds on sample complexity for polynomial-time algorithms.

Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression

Average case analysis of Lasso under ultra-sparse conditions

Lower bounds on the performance of polynomial-time algorithms for sparse linear regression

Hardness and Algorithms for Robust and Sparse Optimization

Optimal Sketching Bounds for Sparse Linear Regression

Robust Sparse Regression with Non-Isotropic Designs

Robust Sparse Mean Estimation via Sum of Squares

Low Rank Approximation and Regression in Input Sparsity Time

On the Sample Complexity of Predictive Sparse Coding

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Feature Adaptation for Sparse Linear Regression

Sparse Linear Regression and Lattice Problems

Online Kernel Learning with a Near Optimal Sparsity Bound

A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation

Sparse PCA Beyond Covariance Thresholding

Statistical and Computational Limits for Sparse Matrix Detection

High Dimensional Robust Sparse Regression

Reducibility and Statistical-Computational Gaps from Secret Leakage

Regularity Properties for Sparse Regression

Near-Optimal Time-Sparsity Trade-Offs for Solving Noisy Linear Equations

A note on the minimax risk of sparse linear regression