Identification of disease-related genes based on tensor factorization with RNA-seq data of huntington's disease mice

Xue Jiang
DOI: https://doi.org/10.1109/CAC.2017.8242731
2017-01-01
Abstract:In recent years, with the development of next-generation sequencing technology, large amounts of omics data have been generated, making it possible to explore the molecular mechanisms of Huntington's disease by computational methods at a genome-wide scale. Since the pathology mechanisms of the neurodegenerative diseases are complicated, the traditional computational methods cannot effectively identify the most disease-related genes. In this paper, we propose a new approach based on tensor factorization to analyze the RNA-seq data of Huntington's disease (TFR). According to the approach, we also design a new framework to identify the disease-related genes. Firstly, the RNA-seq data are mapped into three low-dimensional spaces by TFR, i.e. the gene space, the sample space and the time space. We assume that the common components obtained by TFR in the three subspace represent the hidden biological signals that affect gene expression. Then, the disease-related biological signals are selected, and a ranked list is obtained by sorting the genes according to the gene expression value shaped by the disease-related biological signals. The ability for extracting dependence structures of the gene expression data makes TFR more robust and efficient to identify disease-related genes. Experimental results on the RNA-seq data of Huntington's disease mice demonstrate that TFR outperforms the traditional methods. It has been shown that TFR improves the identification accuracy of the disease-related genes as well as the precision of the top ranked genes.
What problem does this paper attempt to address?