WENDY: Covariance Dynamics Based Gene Regulatory Network Inference

Yue Wang,Peng Zheng,Yu-Chen Cheng,Zikun Wang,Aleksandr Aravkin
2024-10-11
Abstract:Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Molecular Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to infer the gene regulatory network (GRN) structure from single - cell gene expression data, especially in the case where measurements are made at multiple time points after intervention and the joint distribution is unknown**. ### Problem Background Determining the structure of the gene regulatory network (GRN) is a central issue in biology. Traditional methods are difficult to directly measure the expression levels of multiple genes within a single cell, so many methods rely on inferring the GRN structure from gene expression data. Especially for single - cell RNA sequencing technology, although it can analyze the whole transcriptome of a single cell on a large scale, because cells need to be killed during the experimental process, only data at one time point can be measured, which makes it difficult to study gene regulatory relationships that require multi - time - point observations. ### Specific Problem The paper focuses on a specific data type: after an intervention (such as drug treatment), the gene expression levels of different single cells are measured at multiple time points, and data at time points that have not yet reached a steady state are selected. Since the gene expression at each time point is random and each cell can only be measured once, we cannot obtain the joint distribution between different time points. Although this data type provides more information than other methods, there is currently only one method specifically designed for this - SINCERITIES, which requires data from at least six time points and has a low data utilization rate. ### Solution To solve this problem, the author proposes a new algorithm - **WENDY (Network Inference based on Covariance Dynamics)**. The core idea of WENDY is to calculate the gene expression covariance matrices at two time points and model the evolution of these covariance matrices over time. By transforming this process into a non - convex optimization problem, WENDY can infer the GRN structure. ### Advantages of WENDY 1. **Only requires data from two time points**: This is especially valuable for cases where cells die due to intervention or measurement. 2. **Higher data utilization rate**: For single - cell expression data containing n genes and T time points, WENDY extracts \(0.5n^2+ 0.5n\) numerical values for further analysis, significantly improving the data utilization efficiency. ### Summary WENDY provides an effective method to infer the GRN structure in single - cell gene expression data, especially in the case where measurements are made at multiple time points after intervention and the joint distribution is unknown. This method not only improves the data utilization rate but also enables effective inference with fewer time points.