Bayesian learning of multiple directed networks from observational data
Federico Castelletti,Luca La Rocca,Stefano Peluso,Francesco C. Stingo,Guido Consonni
DOI: https://doi.org/10.1002/sim.8751
2020-09-23
Statistics in Medicine
Abstract:Graphical modeling represents an established methodology for identifying complex dependencies in biological networks, as exemplified in the study of co‐expression, gene regulatory, and protein interaction networks. The available observations often exhibit an intrinsic heterogeneity, which impacts on the network structure through the modification of specific pathways for distinct groups, such as disease subtypes. We propose to infer the resulting multiple graphs jointly in order to benefit from potential similarities across groups; on the other hand our modeling framework is able to accommodate group idiosyncrasies. We consider directed acyclic graphs (DAGs) as network structures, and develop a Bayesian method for structural learning of multiple DAGs. We explicitly account for Markov equivalence of DAGs, and propose a suitable prior on the collection of graph spaces that induces selective borrowing strength across groups. The resulting inference allows in particular to compute the posterior probability of edge inclusion, a useful summary for representing flow directions within the network. Finally, we detail a simulation study addressing the comparative performance of our method, and present an analysis of two protein networks together with a substantive interpretation of our findings.
public, environmental & occupational health,medicine, research & experimental,medical informatics,mathematical & computational biology,statistics & probability
What problem does this paper attempt to address?
This paper aims to solve the problem of inferring multiple directed network structures from observational data. Specifically, the author focuses on how to infer multiple directed acyclic graphs (DAGs) among different groups from observational data with inherent heterogeneity. These data usually come from different populations, such as different subtypes of diseases. Each population may have different network structures, but at the same time, there is a certain degree of similarity.
### Main problems
1. **Processing of heterogeneous data**: Heterogeneity in data can affect network structures, especially for different populations, and specific paths may vary. Therefore, how to consider this heterogeneity in the analysis is a key issue.
2. **Joint inference of network structures**: In order to utilize the potential similarities between different populations, the author proposes a method to jointly infer multiple DAGs. This method can not only identify the common structures among different populations but also capture the unique characteristics of each population.
3. **Application of Bayesian methods**: The author develops a Bayesian method for structure learning, explicitly considering the Markov equivalence of DAGs and proposing a suitable prior distribution to selectively borrow strength among different populations.
### Solutions
1. **Joint inference framework**: By jointly inferring multiple DAGs, the similarities between different populations can be better utilized while retaining the uniqueness of each population.
2. **Markov equivalence classes**: Considering that Markov - equivalent DAGs cannot be distinguished by observational data, the author uses the essential graph as a representative of each equivalence class.
3. **Prior distribution**: A new prior distribution is proposed, based on the skeleton structure of the graph, to selectively borrow strength among different populations.
4. **Directional inference**: When possible, this method can also infer the direction of associations in biological networks, which has practical significance for subsequent experimental designs (such as gene knockout experiments).
### Method overview
- **Likelihood function**: It is assumed that the data of each population comes from a multi - dimensional Gaussian distribution, and its covariance matrix is constrained by the essential graph of the specific population.
- **Parameter prior**: The objective Bayesian method is used to specify the parameter prior, and at the same time, a Markov random field prior is placed on the space of multiple essential graphs.
- **Posterior inference**: Posterior sampling is carried out through the Markov chain Monte Carlo (MCMC) algorithm to explore the parameter space, especially the complex essential graph space.
### Application examples
- **Simulation study**: The performance of this method was evaluated through simulation studies and compared with several existing methods.
- **Actual data analysis**: Two protein network data sets were analyzed, providing substantial explanations.
### Conclusion
The method proposed in this paper can effectively infer multiple directed network structures when dealing with observational data with inherent heterogeneity, while utilizing the similarities between different populations, improving the accuracy and robustness of the inference.