Model free feature screening for large scale and ultrahigh dimensional survival data

Pan, Yingli,Wang, Haoyu,Liu, Zhan
DOI: https://doi.org/10.1007/s10463-024-00912-x
2024-10-20
Annals of the Institute of Statistical Mathematics
Abstract:This paper provides a novel perspective on feature screening in the analysis of high-dimensional right-censored large- p -large- N survival data. The research introduces a distributed feature screening method known as Aggregated Distance Correlation Screening (ADCS). The proposed screening framework involves expressing the distance correlation measure as a function of multiple component parameters, each of which can be estimated in a distributed manner using a natural U-statistic from data segments. By aggregating the component estimates, a final correlation estimate is obtained, facilitating feature screening. Importantly, this approach does not necessitate any specific model specification for responses or predictors and is effective with heavy-tailed data. The study establishes the consistency of the proposed aggregated correlation estimator under mild conditions and demonstrates the sure screening property of the ADCS. Empirical results from both simulated and real datasets confirm the efficacy and practicality of the ADCS approach proposed in this paper.
statistics & probability
What problem does this paper attempt to address?