Multivariate Bayesian variable selection with application to multi-trait genetic fine mapping

Travis Canida,Hongjie Ke,Shuo Chen,Zhenayo Ye,Tianzhou Ma
DOI: https://doi.org/10.48550/arXiv.2212.13294
2024-03-02
Abstract:Variable selection has played a critical role in modern statistical learning and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but most of these methods consider selecting variables for only one response. As more data is being collected nowadays, it is common to analyze multiple related responses from the same study. Existing multivariate variable selection methods select variables for all responses without considering the possible heterogeneity across different responses, i.e. some features may only predict a subset of responses but not the rest. Motivated by the multi-trait fine mapping problem in genetics to identify the causal variants for multiple related traits, we developed a novel multivariate Bayesian variable selection method to select critical predictors from a large number of grouped predictors that target at multiple correlated and possibly heterogeneous responses. Our new method is featured by its selection at multiple levels, its incorporation of prior biological knowledge to guide selection and identification of best subset of responses predictors target at. We showed the advantage of our method via extensive simulations and a real fine mapping example to identify causal variants associated with different subsets of addictive behaviors.
Methodology
What problem does this paper attempt to address?
This paper attempts to solve the variable selection problem in multi - trait genetic fine mapping. Specifically, the researchers developed a new multivariate Bayesian variable selection method to address the following challenges: 1. **Limitations of existing methods**: Most existing variable selection methods mainly target a single response variable, while in modern data sets, multiple related response variables are usually analyzed simultaneously. These methods do not consider the heterogeneity between different response variables when selecting variables, that is, some features may only predict some response variables rather than all of them. 2. **Requirements for multi - trait genetic fine mapping**: In genetics, especially in the multi - trait fine - mapping problem, it is very important to identify causal variants associated with multiple related traits. For example, in genomic regions, some single - nucleotide polymorphisms (SNPs) may only affect some traits and have no effect on other traits. This phenomenon is called "pleiotropy". 3. **Incorporating prior biological knowledge**: In genetics applications, incorporating background knowledge (such as functional annotations, linkage disequilibrium patterns, etc.) is crucial for guiding variable selection. This can help narrow the selection range and improve the accuracy of identifying causal variants. ### Research objectives The main objective of this paper is to develop a new multivariate Bayesian variable selection method that can: - Perform variable selection among multiple related response variables, taking into account the heterogeneity between different response variables. - Identify causal variants more precisely through multi - level selection (at the individual feature and group levels). - Incorporate prior biological knowledge to guide variable selection and improve the reliability of the results. ### Method characteristics 1. **Multi - level selection**: The new method not only selects at the individual feature and group levels, but also selects specific features for each response variable, thus allowing the identification of features that have different effects on different response variables. 2. **Subset posterior inclusion probability (subset PIP)**: The concept of subset PIP is introduced to select the trait subset targeted by causal variants. 3. **Flexible Bayesian framework**: In the full Bayesian framework, prior biological knowledge can be flexibly incorporated to guide grouping and weight the probabilities of causal variants. ### Application examples The researchers verified the effectiveness of this method through extensive simulation experiments and actual data (such as the multi - trait fine - mapping problem in the UK Biobank), and successfully identified several important variants associated with different addictive behaviors and their risk factors. In summary, this paper aims to solve the variable selection problem in multi - trait genetic fine mapping by developing a new multivariate Bayesian variable selection method, especially dealing with the heterogeneity between different response variables and incorporating prior biological knowledge.