Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations

Joshua G. Schraiber,Michael D. Edge,Matt Pennell
DOI: https://doi.org/10.1101/2024.02.10.579721
2024-03-07
Abstract:In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique—including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model—can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Genomics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the **unification of methods for mapping phenotypes in structured populations**. Specifically, the paper aims to bridge the methodological differences between statistical genetics and phylogenetics. By deriving a general model that describes the covariance of genetic contributions among different individuals, it shows that standard statistical genetics models (such as Genome - Wide Association Studies, GWAS) and phylogenetic comparative biology models (such as phylogenetic regression) can be interpreted as special cases of this more general model. This helps to establish a unified understanding of the advantages and disadvantages of different methods in controlling genetic structure, and to explore why and when spurious correlations may occur, while also providing methods to solve these problems. ### Background and Objectives of the Paper The paper points out that in statistical genetics and phylogenetics, one of the main goals is to identify the correlations between gene loci or other phenotypic or environmental characteristics and the focal trait. Although both fields are committed to this goal, they have developed different statistical traditions, resulting in significant differences in their methods. As problems in fields such as medicine, conservation biology, and evolutionary biology increasingly rely on the integration of data within and between species, the conceptual divisions in these fields are becoming increasingly blurred. Therefore, the goal of the paper is to bridge this gap by proposing a general model that can describe the covariance of genetic contributions among different individuals, thereby unifying the methods of statistical genetics and phylogenetics. ### Main Contributions 1. **Proposal of a General Model**: The paper derives a general model that describes the covariance of genetic contributions among different individuals, showing that standard GWAS and phylogenetic regression models can be regarded as special cases of this more general model. 2. **Unification of Methodologies**: Through theoretical analysis and simulation experiments, the paper explores why and when spurious correlations may occur and provides methods to control these correlations. 3. **Cross - Domain Applications**: The paper shows how methods in statistical genetics can be applied to phylogenetic analysis, for example, using the Genetic Relatedness Matrix (GRM) and its principal components to reduce spurious correlations in phylogenetic analysis. ### Case Study As a case study, the paper re - analyzes a test of co - evolution of gene expression levels in fungal phylogeny. The results show that incorporating the eigenvectors of the covariance matrix as covariates into the model can reduce the false - positive rate while increasing the true - positive rate. ### Conclusion Overall, this paper provides a basis for understanding the genetic structure of phenotypes and how evolutionary processes shape this structure, promotes the integration of statistical genetics and phylogenetics methods, and provides new perspectives and tools for future research.