Adaptive sufficient sparse clustering by controlling false discovery
Zihao Yuan,Jiaqing Chen,Han Qiu,Houxiang Wang,Yangxin Huang
DOI: https://doi.org/10.1007/s11222-024-10507-4
IF: 2.3241
2024-10-08
Statistics and Computing
Abstract:Sparse clustering divides samples into distinct groups while simultaneously reducing dimensionality in ultra-high dimensional unsupervised learning. To achieve sparse clustering that can screening out noise variables, an adaptive sparse clustering framework based on sufficient variable screening, abbreviated as adaptive sufficient sparse clustering (ASSC), is developed by controlling false discovery. Without any specific model, ASSC employs a composite hypothesis testing procedure that leverages conditional marginal correlations of variables across distinct groups, aiming to pinpoint sufficient variables via sparse clustering and promoting the performance of sparse clustering in ultra-high dimensionality. To control false discoveries of the hypothesis testing procedure at a predetermined level, ASSC provides the adaptive threshold for identifying conditional marginal dependency, which ensures to accuracy of clustering with high probability. Under mild conditions, the sufficient screening properties and sufficient clustering properties of ASSC are established based on variable screening procedure by controlling false discovery. Numerical studies on synthetic data and real datasets corroborate the performance and flexibility of ASSC, and underscore its potent utility in unsupervised learning.
statistics & probability,computer science, theory & methods