Ensemble Non-negative Matrix Factorization for Clustering Biomedical Documents

Shanfeng Zhu,Wei Yuan,Fei Wang
2008-01-01
Abstract:Searching and mining biomedical literature database, such as MEDLINE, is the main source of generating scientific hypothesis for biomedical researchers. Through grouping similar documents together, clustering techniques can facilitate user's need of effectively finding interested documents. Since non-negative matrix factorization (NMF) can effectively capture the latent semantic space with non-negative factorization in both the basis and the weight, it has been utilized to clustering general text documents. Considering the stochastic nature of NMF with respect to initialization, we propose to use ensemble NMF for biomedical document clustering. The performance of ensemble NMF was evaluated on clustering a large number of datasets generated from TREC Genomics track dataset. The experimental results show that our method outperforms classical clustering algorithms bisect k-means, k-means and hierarchical clustering significantly in most of the datasets.
What problem does this paper attempt to address?