Discovery of shared epigenetic pathways across human phenotypes

Ilse Krätschmer,Hannah M. Smith,Daniel L. McCartney,Elena Bernabeu,Mahdi Mahmoudi,Archie Campbell,Janie Corley,Sarah E Harris,Simon R. Cox,Riccardo E. Marioni,Matthew R. Robinson
DOI: https://doi.org/10.1101/2024.04.15.589547
2024-04-16
Abstract:Omics-based association studies typically consider the marginal effects of a feature, such as CpG DNA methylation, on a trait (e.g, independent models for each feature). Although some methods can assess all features together in joint and conditional estimation, this is currently done on a trait-by-trait basis. Here, we introduce MAJA, a method to learn shared and outcome-specific effects for multiple traits in multi-omics data. MAJA determines the unique contribution of individual loci, genes, or molecular pathways, to variation in one or more traits, conditional on all other measured “omics” data genome-wide. Simulations show MAJA accurately finds shared and distinct associations between omics-data and multiple traits and estimates omics-specific (co)variances, allowing for sparsity and correlations within the data. Applying MAJA to 12 outcome traits in Generation Scotland methylation data (n=18,264), we find novel shared epigenetic pathways among cholesterol metabolism, osteoarthritis, blood pressure and asthma. In contrast to marginal testing, we find only 10 CpG probes with significant effects above the genome-wide background. This highlights the need for joint association testing in highly correlated methylation data from whole blood and for studies of increased sample size in order to refine epigenomic associations in observational data.
Genetics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to simultaneously estimate the shared and specific effects among multiple phenotypes in multi - omics data. Existing methods usually only consider the influence of a single feature on a certain phenotype, or are also limited to a single phenotype when making joint estimations. This restricts the understanding of the shared mechanisms among phenotypes, especially in highly correlated methylation data, where this restriction is more obvious. For this reason, the authors introduced a new method - MAJA (MultivAriate Joint bAyesian model) to learn the shared and specific effects of multiple phenotypes from multi - omics data. MAJA can determine the unique contributions of individual gene loci, genes or molecular pathways to one or more phenotypic variations, provided that all other measured "omics" data are considered on a genome - wide scale. Through this method, researchers can more accurately identify the shared and specific associations among phenotypes and estimate the omics - specific (co) variances, allowing for sparsity and correlation in the data. Specifically, the paper solves the above problems through the following points: 1. **Method innovation**: Developed MAJA, a multivariate Bayesian multiple regression model that can simultaneously estimate the omics effects of multiple phenotypes and their correction values while considering the correlation and sparsity in the data. 2. **Simulation verification**: Verified through simulation experiments that MAJA can accurately find the shared and specific associations between omics data and multiple phenotypes and estimate the omics - specific (co) variances. 3. **Practical application**: Applied MAJA to the methylation data of Generation Scotland (18,264 individuals, 831,349 CpG sites) and discovered new shared epigenetic pathways among cholesterol metabolism, osteoarthritis, blood pressure and asthma. 4. **Performance improvement**: Compared with single - phenotype models, MAJA shows higher accuracy in predicting external samples, especially in the prediction of general cognitive function, with a significantly increased proportion of variance explained. In conclusion, this paper aims to improve the association analysis of multiple phenotypes in multi - omics data by developing and applying the MAJA method, thereby better understanding the shared mechanisms among phenotypes, improving prediction performance, and providing support for future disease prevention and clinical management.