K-means Based Unsupervised Feature Selection to Prioritize Biomarkers of Different Disease Clinical Phases

Xue Jiang,Weidi Wang,Jing Xu,Zhen Wang,Guan Ning Lin
DOI: https://doi.org/10.1101/2020.04.21.052704
2020-01-01
Abstract:Huntington’s disease is caused by a single gene mutation, which is potentially a good model for development of biomarkers corresponding to different disease phase and clinical phenotypes. Hypothesis-driven and omics discovery approaches have not yet identified effective candidate biomarkers in HD. So, it is urgent to develop engagement and disease-phase specific biomarkers. The advanced sequencing technology makes it possible to develop data-driven methods for biomarkers discovery. Therefore, in this study, we designed k-means based unsupervised feature selection (KFS) method to prioritize biomarkers of different disease clinical phases. KFS first conducts k-means clustering on the samples with gene expression data, then it conducts feature selection based on the feature selection matrix to prioritize biomarkers of different samples. By conducting alternative iteration of clustering and feature selection to screen key genes which corresponding to the complex clinical phenotypes of different disease phases. Further gene ontology and enrichment analysis highlight potential molecular mechanisms of HD. Our experimental analyses have uncovered new disease-related genes and disease-associated pathways, which in turn have provided insight into the molecular mechanisms during the disease progression. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?