Robust inference with knockoffs

Rina Foygel Barber, Emmanuel J Candès, Richard J Samworth
2020-06-01
Abstract:We consider the variable selection problem, which seeks to identify important variables influencing a response Y out of many candidate features X₁, . . . , Xp. We wish to do so while offering finite-sample guarantees about the fraction of false positives—selected variables Xj that in fact have no effect on Y after the other features are known. When the number of features p is large (perhaps even larger than the sample size n), and we have no prior knowledge regarding the type of dependence between Y and X, the model-X knockoffs framework nonetheless allows us to select a model with a guaranteed bound on the false discovery rate, as long as the distribution of the feature vector X = (X₁, . . . , Xp) is exactly known. This model selection procedure operates by constructing “knockoff copies” of each of the p features, which are then used as a control group to ensure that the model selection algorithm is not choosing …
What problem does this paper attempt to address?