MRSL: a causal network pruning algorithm based on GWAS summary data

Lei Hou,Zhi Geng,Zhongshang Yuan,Xu Shi,Chuan Wang,Feng Chen,Hongkai Li,Fuzhong Xue
DOI: https://doi.org/10.1093/bib/bbae086
IF: 9.5
2024-03-17
Briefings in Bioinformatics
Abstract:Causal discovery is a powerful tool to disclose underlying structures by analyzing purely observational data. Genetic variants can provide useful complementary information for structure learning. Recently, Mendelian randomization (MR) studies have provided abundant marginal causal relationships of traits. Here, we propose a causal network pruning algorithm MRSL (MR-based structure learning algorithm) based on these marginal causal relationships. MRSL combines the graph theory with multivariable MR to learn the conditional causal structure using only genome-wide association analyses (GWAS) summary statistics. Specifically, MRSL utilizes topological sorting to improve the precision of structure learning. It proposes MR-separation instead of d-separation and three candidates of sufficient separating set for MR-separation. The results of simulations revealed that MRSL had up to 2-fold higher F1 score and 100 times faster computing time than other eight competitive methods. Furthermore, we applied MRSL to 26 biomarkers and 44 International Classification of Diseases 10 (ICD10)-defined diseases using GWAS summary data from UK Biobank. The results cover most of the expected causal links that have biological interpretations and several new links supported by clinical case reports or previous observational literatures.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use the existing marginal causal relationships in Mendelian Randomization (MR) studies to construct a more accurate causal network. Specifically, the author proposes a causal network pruning algorithm MRSL (MR - based Structure Learning algorithm) based on GWAS summary data, aiming to learn the conditional causal structure through Multivariable Mendelian Randomization (MVMR) and graph - theoretic methods, and remove spurious direct edges in the marginal causal graph. ### Specific background of the problem 1. **Importance of causal discovery** - Causal discovery is a powerful tool for inferring causal structures from purely observational data and is widely used in fields such as biological networks and disease diagnosis. - With the accumulation of large - scale population - based Genome - Wide Association Study (GWAS) data, genetic variations provide new insights and can be used as instrumental variables (IVs) for causal inference. 2. **Limitations of existing methods** - Existing causal structure learning methods face challenges when dealing with complex networks, especially when there are unobserved confounding factors. - Some methods rely on the causal sufficiency assumption (i.e., no unobserved confounding factors exist), which is difficult to satisfy in practical applications. 3. **Application of Mendelian randomization** - Mendelian randomization (MR) can control unobserved confounding factors and avoid reverse causality by using genetic variations as instrumental variables. - A large number of univariate and multivariate MR studies have already revealed the marginal causal relationships between traits, but these studies mainly focus on the effects of single exposures or multiple exposures on a single outcome and fail to construct a complete causal network. ### Core problems of the paper - **How to make full use of the existing MR research results** and combine GWAS summary data to construct a more comprehensive and accurate causal network? - **How to identify and remove spurious direct causal relationships** in complex causal networks to obtain conditional causal graphs? ### Solutions The MRSL algorithm proposed in the paper solves the above problems through the following steps: 1. **Input data** - Use GWAS summary data and marginal causal graphs as input. - Marginal causal graphs can be obtained through pairwise two - way MR or by summarizing the results of published MR studies. 2. **Combination of graph theory and MVMR** - Use topological sorting in graph theory to improve the accuracy of structure learning. - Propose MR - separation to replace d - separation and define three candidate sufficient separation sets for detecting and removing spurious direct edges. 3. **Iterative optimization** - Through multiple iterative adjustments, remove spurious direct edges in the marginal causal graph and finally obtain the conditional causal graph. 4. **Performance evaluation** - Evaluate the performance of MRSL through simulation experiments and compare it with eight other commonly used methods. The results show that MRSL has a higher F1 score and faster calculation speed. 5. **Practical application** - Apply MRSL to 26 biomarkers and 44 ICD10 - defined diseases in the UK Biobank to verify its effectiveness in actual data. Through these steps, the MRSL algorithm can effectively construct and prune causal networks and provide more accurate conditional causal structures by using GWAS summary data and multivariate MR methods without relying on individual data.