Validating Clustering for Gene Expression Data Based on Semantic Distance of Gene Ontology Terms

Feizhen Wu,Wenli Ma,Mei Wang,QiLong Chen,Wenling Zheng
DOI: https://doi.org/10.1109/ICBBE.2008.172
2008-01-01
Abstract:Clustering algorithms for gene expression data attempt to partition the gene expression data into groups, which exhibits similar patterns of variation in expression level. Many clustering algorithms have been proposed, but little guidance is available to evaluate the clustering result from biological meaning. We developed a new algorithm to measure semantic distance between Gene Ontology (GO) terms. Based on this algorithm, we proposed a novel method to assess the biological predictive power of the clustering algorithms: among a cluster, the more similar the functions of genes are, the lower the semantic distance is. We applied the approach to evaluating hierarchical clustering algorithms for yeast cell and diabetes datasets, and successfully obtained the biological features of the gene clusters. We found the approach may contribute to achieve better clustering results.
What problem does this paper attempt to address?