Sparse Orthogonal Nonnegative Matrix Factorization for Identifying Differentially Expressed Genes and Clustering Tumor Samples.

Ling-Yun Dai,Jin-Xing Liu,Rong Zhu,Xiang-Zhen Kong,Mi-Xiao Hou,Sha-Sha Yuan
DOI: https://doi.org/10.1109/bibm.2018.8621438
2018-01-01
Abstract:Gene expression data are critical for disease diagnoses and classification. However gene expression data usually are high-dimensional and high-noisy. Currently, many matrix factorization methods have been widely used for dimensionality reduction and data preprocessing in bioinformatics. Particularly, nonnegative matrix factorization (NMF) has the outstanding interpretability in analyzing gene expression data due to the nonnegative constraints. In this paper, a new nonnegative matrix factorization algorithm named sparse orthogonal nonnegative matrix factorization (SONMF) is proposed and applied to identify differentially expressed genes and cluster tumor samples, in which the L1-norm regularization and the orthogonal constraint are incorporated into the traditional NMF model to get more powerful data analysis tool. An iterative algorithm is proposed to optimize the new objective function. In order to prove the efficiency of the algorithm, SONMF is tested on four public gene expression datasets and compared with the other four NMF methods. The experimental results on the four real tumor datasets confirm the efficiency of SONMF for identifying differentially expressed genes and clustering tumor samples.
What problem does this paper attempt to address?