Multi-View Random-Walk Graph Regularization Low-Rank Representation for Cancer Clustering and Differentially Expressed Gene Selection

Juan Wang,Li-Hong Wang,Jin-Xing Liu,Xiang-Zhen Kong,Sheng-Jun Li
DOI: https://doi.org/10.1109/jbhi.2022.3151333
IF: 7.7
2022-01-01
IEEE Journal of Biomedical and Health Informatics
Abstract:Cancer genome data generally consists of multiple views from different sources. These views provide different levels of information about gene activity, as well as more comprehensive cancer information. The low-rank representation (LRR) method, as a powerful subspace clustering method, has been extended and applied in cancer data research. Although the multi-view learning methods based on low rank representation have achieved good results in cancer multi-omics analysis because they fully consider the consistency and complementarity between views, these methods have some shortcomings in mining the potential local geometry of data. In view of this, this paper proposes a new method named Multi-view Random-walk Graph regularization Low-Rank Representation (MRGLRR) to comprehensively analyze multi-view genomics data. This method uses multi-view model to find the common centroid of view. By constructing a joint affinity matrix to learn the low-rank subspace representation of multiple sets of data, the hidden information of each view is fully obtained. In addition, this method introduces random walk graph regularization constraint to obtain more accurate similarity between samples. Different from the traditional graph regularization constraint, after constructing the KNN graph, we use the random walk algorithm to obtain the weight matrix. The random walk algorithm can retain more local geometric information and better learn the topological structure of the data. What's more, a feature gene selection strategy suitable for multi-view model is proposed to find more differentially expressed genes with research value. Experimental results show that our method is better than other representative methods in terms of clustering and feature gene selection for cancer multi-omics data.
What problem does this paper attempt to address?