A screening method for ultra-high dimensional features with overlapped partition structures
Jie He,Jiali Song,Xiao-Hua Zhou,Yan Hou
DOI: https://doi.org/10.1177/09622802221129043
IF: 2.494
2023-01-01
Statistical Methods in Medical Research
Abstract:Ultra-high dimensional data, such as gene and neuroimaging data, are becoming increasingly important in biomedical science. Identifying important biomarkers from the huge number of features can help us gain better insights into further researches. Variable screening is an efficient tool to achieve this goal under the large scale cases, which reduces the dimension of features into a moderate size by removing the major part of inactive ones. Developing novel variable screening methods for high-dimensional features with group structures is challenging, especially under the overlapped cases. For example, the huge-scaled genes usually can be partitioned into hundreds of pathways according to background knowledge. One primary characteristic for this type of data is that many genes may appear across more than one pathway, which means that different pathways are overlapped. However, existing variable screening methods only could deal with disjoint group structure cases. To fill this gap, we propose a novel variable screening method for the generalized linear model by incorporating overlapped partition structures with theoretical guarantee. Besides the sure screening property, we also test the performance of the proposed method through a series of numerical studies and apply it to statistical analysis of a breast cancer data.