SCGRNs: Novel supervised inference of single-cell gene regulatory networks of complex diseases

Turki Turki,Y-H Taguchi,Y-h. Taguchi
DOI: https://doi.org/10.1016/j.compbiomed.2020.103656
IF: 7.7
2020-03-01
Computers in Biology and Medicine
Abstract:Single-cell gene regulatory network (SCGRN) inference refers to the process of inferring gene regulatory networks from single-cell data, which are generated via single-cell RNA-sequencing (scRNA-seq) technologies. Although scRNA-seq leads to the generation of data pertaining to cells of particular interest, the single-cell data are noisy and highly sparse, which makes the analysis of such data a challenging task. In this study, we model an SCGRN as a directed graph where an edge from a source node (also called transcription factor (TF)) to a target node (also called target gene) indicates that a TF regulates a target gene. Inferring the SCGRN via predicting TF-target gene regulations would help biologists better understand various diseases in terms of networks. Following the modeling step, we propose three machine learning approaches. The first approach considers feature vectors encoding regulatory relationships of expressed TFs-target genes as input. The resulting model is then used to predict unseen TF-target gene regulations. The second machine learning approach constructs new feature vectors via incorporating features obtained from stacked autoencoders, which are provided to a machine learning algorithm to induce a model and predict unseen regulations of TFs-target genes. The third approach extends the second approach via including topological features extracted from an SCGRN. We perform an experimental study comparing our approaches against adapted unsupervised approaches. Experimental results on SCGRNs pertaining to healthy and type 2 pancreatic diabetes demonstrate the clinical importance and the accurate prediction performance of the proposed approaches.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inference problem of single - cell gene regulatory networks (SCGRN), especially for complex diseases. Specifically, the authors focus on how to infer gene regulatory networks (GRNs) from single - cell RNA sequencing (scRNA - seq) data. Since single - cell data is characterized by high noise and high sparsity, the analysis of such data is very challenging. Therefore, this paper proposes a new supervised learning method to infer gene regulatory networks at the single - cell level, aiming to help biologists better understand the mechanisms of diseases at the molecular network level. The main contributions of the paper include: 1. Proposing a new formalized method of supervised learning to infer gene regulatory networks based on single - cell RNA sequencing data, which lays the foundation for future research. 2. Proposing three machine - learning methods, using xgboost, support vector machine (SVM) and deepboost algorithms respectively, and comparing them with existing unsupervised methods. 3. Applying these methods to single - cell data of healthy pancreas and type 2 diabetic pancreas to infer their gene regulatory networks. 4. Experimental results show that the proposed methods are superior to the adapted unsupervised methods in inferring gene regulatory networks based on single - cell data. Through these works, the authors hope to more accurately infer gene regulatory networks at the single - cell level, thereby providing a deeper understanding of the biological mechanisms of complex diseases.