Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression

Rares-Darius Buhai,Jingqiu Ding,Stefan Tiegel
2024-06-25
Abstract:We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $\Theta(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms. We give evidence that efficient algorithms for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce. Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.
Machine Learning,Computational Complexity,Statistics Theory
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily investigates the computational-statistical gaps in sparse linear regression. Specifically, given samples drawn from a high-dimensional sparse linear model, the paper explores the minimum sample complexity required to find a dense estimator within polynomial time to achieve non-trivial prediction error. #### Research Background: - **Sparse Linear Regression Model**: Given a set of samples \((\mathbf{x}_i, y_i)\) drawn from a high-dimensional sparse linear model where \(y_i = \langle \mathbf{x}_i, \boldsymbol{\beta}^* \rangle + \epsilon_i\), the goal is to find an estimator \(\hat{\boldsymbol{\beta}}\) to approximate the unknown sparse vector \(\boldsymbol{\beta}^*\). - **Computational-Statistical Gaps**: Information-theoretically, the number of samples required to achieve a certain prediction error is \(\Theta(k \log(p/k))\); however, without additional model constraints, no known polynomial-time algorithm can achieve the same performance with fewer than \(\Theta(p)\) samples. #### Main Contributions: - By constructing a reduction from sparse principal component analysis (sparse PCA) to sparse linear regression, the paper proves that even in improper settings (i.e., the output can be dense), any polynomial-time algorithm requires at least \(\Omega(k^2)\) samples. - The authors demonstrate the connection between the sparse linear regression problem and the negative-spiked sparse PCA problem, and use this connection to prove the computational difficulty of sparse linear regression. #### Method Overview: - By constructing a special instance of sparse PCA where the noise variance is unknown, the authors demonstrate the difficulty of the sparse linear regression problem. - In the case where the noise variance is known, the paper also proves the difficulty of the sparse linear regression problem and further substantiates this by constructing a symmetric distinguishing problem. In summary, this paper reveals the complex relationship between computation and statistics in sparse models through the study of the sparse linear regression problem and provides theoretical lower bounds on sample complexity for polynomial-time algorithms.