FEATURE SELECTION FOR CLUSTERING DISEASE SAMPLES BASED ON GENE ONTOLOGY

XU Jian-zhen,GUO Zheng,LI Xia,LI Yong-jin,LIU Shuai,TU Kang
DOI: https://doi.org/10.3321/j.issn:1000-6737.2005.03.004
2005-01-01
ACTA BIOPHYSICA SINICA
Abstract:The observation that the disease subtypes can be clustered well based on the top 10% genes expression with the highest variations across disease samples was demonstrated by analyzing two microarray datasets of both leukemia and lymphoma. It was showed that the feature genes containing strong clustering information have different distribution characteristics in the two disease datasets. Based on above observations, a new method combining gene expression profiles with gene functional knowledge to select feature genes for disease samples clustering, was proposed. After each individual gene was annotated to defined functional classes in Gene Ontology, the disease relevant functional classes enriched significantly with differentially expressed genes were identified and then the disease samples were clustered by the differentially expressed genes contained in these identified functional classes. The experimental results showed that the performance of new clustering procedure is better than that of traditional procedure. Besides, biological function comprehensions can be achieved directly with this new approach. Two feature gene sets, which may be functionally relevant to leukemia and lymphoma respectively, are extracted.
What problem does this paper attempt to address?