Integration of cluster ensemble and EM based text mining for microarray gene cluster identification and annotation.

Xiaohua Hu,Xiaodan Zhang,Xiaohua Zhou
DOI: https://doi.org/10.1145/1183614.1183749
2006-01-01
Abstract:In this paper, we design and develop a unified system GE-Miner (Gene Expression Miner) to integrate cluster ensemble, text clustering and multi document summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high quality gene cluster. In our text summarization module, given a gene cluster, our Expectation Maximization (EM) based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high quality clusters and provide informative key terms for the gene clusters.
What problem does this paper attempt to address?