Joint trajectory and network inference via reference fitting

Stephen Y Zhang
2024-09-11
Abstract:Network inference, the task of reconstructing interactions in a complex system from experimental observables, is a central yet extremely challenging problem in systems biology. While much progress has been made in the last two decades, network inference remains an open problem. For systems observed at steady state, limited insights are available since temporal information is unavailable and thus causal information is lost. Two common avenues for gaining causal insights into system behaviour are to leverage temporal dynamics in the form of trajectories, and to apply interventions such as knock-out perturbations. We propose an approach for leveraging both dynamical and perturbational single cell data to jointly learn cellular trajectories and power network inference. Our approach is motivated by min-entropy estimation for stochastic dynamics and can infer directed and signed networks from time-stamped single cell snapshots.
Quantitative Methods,Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is network inference, especially reconstructing the interaction networks in complex systems from experimental observational data. Specifically, the authors propose a method that uses dynamic and perturbed single - cell data to jointly learn cell trajectories and perform network inference. This method aims to overcome the limitations of existing methods in dealing with time - series data and perturbed data, especially in systems biology, when the system is in a steady state, causal information is lost due to the lack of time information. ### Background of the Paper Network inference is a core problem in systems biology, aiming to reconstruct the interactions of complex systems from experimental observations. Although significant progress has been made in the past two decades, network inference remains an open problem. For systems in a steady state, it is difficult to determine causality due to the lack of time information. To obtain the causal information of the system, two common methods are using time - dynamic trajectories and applying interventions (such as gene knockout). ### Objectives of the Paper This paper proposes a new method to jointly infer cell trajectories and network structures by combining dynamic data and perturbed data. Based on the principle of minimum entropy estimation, this method can infer directed and signed networks from time - stamped single - cell snapshots. ### Overview of the Method 1. **Dynamic Inference**: - Model the cell state \( X_t\in\mathbb{R}^d \) as an autonomous drift - diffusion stochastic dynamics driven by Brownian noise \( B_t \): \[ dX_t = f(X_t)dt+\sigma dB_t,\quad X_0\sim\rho_0 \] - Consider the time - series observation setting and obtain snapshots from \( T\geq2 \) consecutive time points \( 0 = t_1,\ldots,t_T = 1 \). The snapshot at each time point \( t_i \) contains \( N_i \) independently measured cell states \( X_i=\{x_{ij}\}_{j = 1}^{N_i}\subset\mathbb{R}^d \). 2. **Reference Fitting**: - Use a more general Ornstein - Uhlenbeck (OU) process with its linear SDE: \[ dX_t=(AX_t + b)dt+\sigma dB_t \] - Where \( A\in\mathbb{R}^{d\times d} \) is the linear interaction matrix and \( b\in\mathbb{R}^d \) is the constant drift term. Each element \( A_{ij} \) represents the influence of gene \( j \) on gene \( i \), with a positive value indicating activation and a negative value indicating inhibition. - The optimization problem is as follows: \[ \min_{A\in\mathbb{R}^{d\times d},b\in\mathbb{R}^d}\min_{\pi\in C(\mu,\mu')}\sigma^2\text{KL}(\pi|K_{\sigma}^{(A,b)})+R(A,b) \] where \( R(A,b) \) is a regularization term to ensure the well - definedness of the optimization problem. 3. **Perturbation Modeling**: - Consider gene - knockout perturbations and modify the linear interaction matrix \( A \) to \( A^{(g)} \), where the \( g \)-th row is set to zero, indicating that the expression of the knocked - out gene \( g \) no longer depends on other genes. - For a set of perturbed genes \( G \) and wild - type trajectories, the optimization objective is as follows: \[ \min_{A}\frac{1}{|G| + 1}\sum_{g\in G\cup\{\emptyset\}}\left[\frac{1}{T - 1}\sum_{i = 1}^{