Reproducible Feature Selection for High-Dimensional Measurement Error Models
Xin Zhou,Yang Li,Zemin Zheng,Jie Wu,Jiarui Zhang
DOI: https://doi.org/10.1287/ijoc.2023.0282
IF: 3.288
2024-11-09
INFORMS Journal on Computing
Abstract:The literature has witnessed an upsurge of interest in dealing with corrupted data in diverse operations research and optimization applications. Despite the substantial progress of feature selection, how to control the false discovery rate (FDR) under measurement errors remains largely unexplored, especially in the knockoffs framework. In this paper, we extend the recently developed knockoff procedures designed for clean data sets to deal with corrupted data. To be specific, we propose a new method called the double projection knockoff filter (DP-knockoff) for reproducible feature selection under additive measurement errors in the high-dimensional setup. Our key contribution is to show that the FDR of the proposed DP-knockoff can be asymptotically controlled within a user-specified level. This is nontrivial because there is no way to obtain the exact knockoff copies due to the unobservable measurement errors. We address this issue by resorting to certain bias-corrected test statistics. Our numerical studies and real data analysis demonstrate the effectiveness of the proposed procedure. History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning. Funding: Financial support from the National Key Research and Development Program of China [Grant 2022YFA1008000], the Natural Science Foundation of China [Grants 11671374, 12101584, 71731010, 71921001, and 72071187], the Fundamental Research Funds for the Central Universities [Grants WK3470000017 and WK2040000047], the Doctoral Research Start-up Funds Projects of Anhui University [Grant S020318033/005], and the University Natural Science Research Project of Anhui Province [Grant 2023AH050101] is gratefully acknowledged. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0282 ), as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0282 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
computer science, interdisciplinary applications,operations research & management science