False Variable Selection Rates in Regression

Max Grazier G'Sell,Trevor Hastie,Robert Tibshirani
DOI: https://doi.org/10.48550/arXiv.1302.2303
2013-02-10
Abstract:There has been recent interest in extending the ideas of False Discovery Rates (FDR) to variable selection in regression settings. Traditionally the FDR in these settings has been defined in terms of the coefficients of the full regression model. Recent papers have struggled with controlling this quantity when the predictors are correlated. This paper shows that this full model definition of FDR suffers from unintuitive and potentially undesirable behavior in the presence of correlated predictors. We propose a new false selection error criterion, the False Variable Rate (FVR), that avoids these problems and behaves in a more intuitive manner. We discuss the behavior of this criterion and how it compares with the traditional FDR, as well as presenting guidelines for determining which is appropriate in a particular setting. Finally, we present a simple estimation procedure for FVR in stepwise variable selection. We analyze the performance of this estimator and draw connections to recent estimators in the literature.
Methodology
What problem does this paper attempt to address?