Abstract:Over the past decade, characterizing the exact asymptotic risk of regularized estimators in high-dimensional regression has emerged as a popular line of work. This literature considers the proportional asymptotics framework, where the number of features and samples both diverge, at a rate proportional to each other. Substantial work in this area relies on Gaussianity assumptions on the observed covariates. Further, these studies often assume the design entries to be independent and identically distributed. Parallel research investigates the universality of these findings, revealing that results based on the i.i.d.~Gaussian assumption extend to a broad class of designs, such as i.i.d.~sub-Gaussians. However, universality results examining dependent covariates so far focused on correlation-based dependence or a highly structured form of dependence, as permitted by right rotationally invariant designs. In this paper, we break this barrier and study a dependence structure that in general falls outside the purview of these established classes. We seek to pin down the extent to which results based on i.i.d.~Gaussian assumptions persist. We identify a class of designs characterized by a block dependence structure that ensures the universality of i.i.d.~Gaussian-based results. We establish that the optimal values of the regularized empirical risk and the risk associated with convex regularized estimators, such as the Lasso and ridge, converge to the same limit under block dependent designs as they do for i.i.d.~Gaussian entry designs. Our dependence structure differs significantly from correlation-based dependence, and enables, for the first time, asymptotically exact risk characterization in prevalent nonparametric regression problems in high dimensions. Finally, we illustrate through experiments that this universality becomes evident quite early, even for relatively moderate sample sizes.

Simultaneous support recovery in high dimensions: Benefits and perils of block $\ell_1/\ell_\infty$-regularization

Sparse Estimation Via ℓ_q Optimization Method in High-Dimensional Linear Regression

Sharp Threshold for Multivariate Multi-Response Linear Regression via Block Regularized Lasso

Signed Support Recovery for Single Index Models in High-Dimensions

Sparse Support Recovery with Non-smooth Loss Functions

$\ell_1$-Regularized Generalized Least Squares

Generalization of l1 constraints for high dimensional regression problems

Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

The Impact of Regularization on High-dimensional Logistic Regression

Theoretical limits of descending $\ell_0$ sparse-regression ML algorithms

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression

Adaptive $$l_p$$ $$(0<p<1)$$ Regularization: Oracle Property and Applications

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Universality in block dependent linear models with applications to nonparametric regression

Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

Estimation with Norm Regularization

Feature Selection With $\ell_{2,1-2}$ Regularization

Graph-based regularization for regression problems with alignment and highly-correlated designs

Noisy recovery from random linear observations: Sharp minimax rates under elliptical constraints

High-Dimensional Linear Regression via Implicit Regularization