DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets

Lazar Atanackovic,Alexander Tong,Bo Wang,Leo J. Lee,Yoshua Bengio,Jason Hartford
2023-12-23
Abstract:One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and their products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG), and (2) observations have significant measurement noise, so for typical sample sizes there will always be a large equivalence class of graphs that are likely given the data, and we want methods that capture this uncertainty. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges. Because we have access to velocity information, we can treat the Bayesian structure learning problem as a problem of sparse identification of a dynamical system, capturing cyclic feedback loops through time. Since our objective is to model uncertainty over discrete structures, we leverage Generative Flow Networks (GFlowNets) to estimate the posterior distribution over the combinatorial space of possible sparse dependencies. Our results indicate that our method learns posteriors that better encapsulate the distributions of cyclic structures compared to counterpart state-of-the-art Bayesian structure learning approaches.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the inference problem of Gene Regulatory Networks (GRNs). Specifically, the author focuses on two non - standard challenges: 1. **Cyclic feedback mechanisms in gene regulatory networks**: Gene regulatory networks essentially contain cyclic structures, so they cannot be simply modeled as Directed Acyclic Graphs (DAGs). Existing methods usually only focus on identifying cyclic structures from dynamic data while ignoring uncertainty modeling. 2. **Significant measurement noise in observational data**: Due to the large amount of measurement noise in actual data, even in the case of typical sample sizes, there are always a large number of possible graph structures. Therefore, methods are required to be able to capture this uncertainty. To address these two challenges simultaneously, the author proposes a new framework, **DynGFN**, which uses Generative Flow Networks (GFlowNets) to estimate the posterior distribution and estimates the rate of change (velocity) of gene expression through RNA velocity techniques, thereby transforming the Bayesian structure learning problem into the identification problem of sparse dynamic systems. This method can efficiently search in the discrete structure space and can better capture the distribution of cyclic structures. ### Main contributions: 1. **Proposed a new Bayesian structure - learning framework**: This framework is used for dynamic system identification and can model complex posterior distributions, especially the posterior distributions of cyclic graphs. The framework supports flexible parameterization and can capture linear and nonlinear dynamic relationships. 2. **Designed a new GFlowNet architecture**: Dynamic GFlowNet (DynGFN) is specifically designed to model the posterior distribution of cyclic structures. Through node decomposition, DynGFN can efficiently search the discrete cyclic graph space. 3. **Empirical evaluation on synthetic dynamic data**: The author verified the effectiveness of DynGFN on synthetic data, especially performing excellently on highly multimodal graph - structure posterior distributions. 4. **Application on real - biological systems**: Using single - cell RNA velocity data, the application of DynGFN in learning the posterior distribution of gene regulatory networks is demonstrated. ### Method overview: - **Dynamic system modeling**: The inference problem of gene regulatory networks is regarded as the identification problem of sparse dynamic systems, and the rate of change of gene expression is estimated using RNA velocity data. - **Application of GFlowNets**: GFlowNets are used to estimate the posterior distribution of graph structures, and the search efficiency is improved through node - by - node decomposition. - **Parameter learning**: HyperNetwork is used to parameterize the parameters of the structural equation model so that they depend on the graph structure. ### Experimental results: - **Synthetic data experiments**: On synthetic data, DynGFN performs well on multiple metrics, especially in posterior modeling of graph structures with multimodal distributions. - **Real - data experiments**: On single - cell RNA velocity data, DynGFN can effectively learn the posterior distribution of gene regulatory networks. In conclusion, this paper proposes an innovative method that can effectively perform Bayesian structure learning while dealing with cyclic feedback mechanisms and measurement noise in gene regulatory networks.