FDR Control and Power Analysis for High-Dimensional Logistic Regression Via StabKoff

Panxu Yuan,Yinfei Kong,Gaorong Li
DOI: https://doi.org/10.1007/s00362-023-01501-5
2024-01-01
Statistical Papers
Abstract:Identifying significant variables for the high-dimensional logistic regression model is a fundamental problem in modern statistics and machine learning. This paper introduces a stability knockoffs (StabKoff) selection procedure by merging stability selection and knockoffs to conduct controlled variable selection for logistic regression. Under some regularity conditions, we show that the proposed method achieves FDR control under the finite-sample setting, and the power also asymptotically approaches one as the sample size tends to infinity. In addition, we further develop an intersection strategy that allows better separation of knockoff statistics between significant and unimportant variables, which in some cases leads to an increase in power. The simulation studies demonstrate that the proposed method possesses satisfactory finite-sample performance compared with existing methods in terms of both FDR and power. We also apply the proposed method to a real data set on opioid use disorder treatment.
What problem does this paper attempt to address?