Abstract:In the context of high-dimensional Gaussian linear regression for ordered variables, we study the variable selection procedure via the minimization of the penalized least-squares criterion. We focus on model selection where the penalty function depends on an unknown multiplicative constant commonly calibrated for prediction. We propose a new proper calibration of this hyperparameter to simultaneously control predictive risk and false discovery rate. We obtain non-asymptotic bounds on the False Discovery Rate with respect to the hyperparameter and we provide an algorithm to calibrate it. This algorithm is based on quantities that can typically be observed in real data applications. The algorithm is validated in an extensive simulation study and is compared with several existing variable selection procedures. Finally, we study an extension of our approach to the case in which an ordering of the variables is not available.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper mainly focuses on the problem of ordered variable selection in high - dimensional Gaussian linear regression. Specifically, the author studies the method of variable selection by minimizing the penalized least - squares criterion and focuses on the penalty function in model selection, which depends on an unknown multiplicative constant and is usually used for prediction calibration. The main objective of the paper is to propose a new hyperparameter calibration method to simultaneously control the predictive risk (PR) and the false discovery rate (FDR). The author derives the non - asymptotic FDR bound with respect to the hyperparameter and provides an algorithm based on observables in practical data applications to calibrate this hyperparameter. ### Key contributions 1. **Simultaneous control of FDR and PR**: - The author proposes a new hyperparameter calibration method that can simultaneously control the predictive risk and the false discovery rate during the model selection process. - Through the calibration of an appropriate penalty function, non - asymptotic control of the predictive risk can be achieved. 2. **Theoretical results**: - Derives the non - asymptotic upper and lower bounds of FDR given the variable order and known variance. - Proves that when the hyperparameter \( K \) tends to infinity, FDR converges to 0 at an exponential rate. 3. **Algorithm verification**: - Proposes an algorithm to calibrate the hyperparameter \( K \) and verifies the effectiveness of this algorithm through extensive simulation studies. - Compares the proposed algorithm with existing variable selection methods and shows its superiority in prediction performance and FDR control. 4. **Extension to non - ordered variable selection**: - Explores how to estimate the variable order by a data - driven method in the absence of a natural variable order and applies it to the model selection process. ### Formulas and symbols - **Model definition**: \[ Y = X\beta^*+\epsilon \] where \( Y\in\mathbb{R}^n \) is the response vector, \( X\in\mathbb{R}^{n\times p} \) is the design matrix, \( \beta^*\in\mathbb{R}^p \) is the true regression coefficient, and \( \epsilon\sim N(0,\sigma^2I_n) \) is the noise term. - **Predictive risk**: \[ PR(m)=\mathbb{E}\left[\|Y - X\hat{\beta}_m\|^2_2\right] \] - **False discovery rate**: \[ FDR(m)=\mathbb{E}\left[\frac{FP(m)}{\max(D_m, 1)}\right] \] where \( FP(m) \) is the number of variables included in model \( m \) but not in the true model \( m^* \), and \( D_m \) is the dimension of model \( m \). - **Penalty function**: \[ \text{pen}(D_m)=K\sigma^2D_m \] - **Selected model**: \[ \hat{m}(K)=\arg\min_{m\in M}\left\{\|Y - X\hat{\beta}_m\|^2_2+K\sigma^2D_m\right\} \] ### Conclusion This paper successfully achieves the simultaneous control of predictive risk and false discovery rate in high - dimensional Gaussian linear regression by proposing a new hyperparameter calibration method. Theoretical results and experimental verification show that this method has high effectiveness and robustness in practical applications. In addition, the paper also explores how to extend this method in the absence of a natural variable order, further broadening its application range.

Trade-off between predictive performance and FDR control for high-dimensional Gaussian model selection

Faithful Variable Screening for High-Dimensional Convex Regression

Variable Selection for High Dimensional Gaussian Copula Regression Model: an Adaptive Hypothesis Testing Procedure.

Variable Selection via Adaptive False Negative Control in Linear Regression

Controlling the False Discovery Rate for Binary Feature Selection via Knockoff

False discovery control for penalized variable selections with high-dimensional covariates.

Variable Selection and Minimax Prediction in High-dimensional Functional Linear Model

Variable Selection in High-Dimensional Error-in-Variables Models Via Controlling the False Discovery Proportion

False Variable Selection Rates in Regression

Variable Selection for High-dimensional Cox Model with Error Rate Control

A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization

Controlling the False Discovery Rate in Subspace Selection

Optimal Feature Selection in High-Dimensional Discriminant Analysis

Directional FDR Control for Sub-Gaussian Sparse GLMs

Exact variable selection in sparse nonparametric models

High-Dimensional False Discovery Rate Control for Dependent Variables

Variable Selection Using Nonlocal Priors in High-Dimensional Generalized Linear Models With Application to fMRI Data Analysis

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Sparse PCA with False Discovery Rate Controlled Variable Selection

Post Selection Shrinkage Estimation for High Dimensional Data Analysis

Bayesian Controlled FDR Variable Selection via Knockoffs