Abstract:In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique—including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model—can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.

Inferring trait-specific similarity among individuals from molecular markers and phenotypes with Bayesian regression

Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations

A Regression-based Approach to Robust Estimation and Inference for Genetic Covariance

A Bayesian framework for inference of the genotype-phenotype map for segregating populations

Assessing phenotypic correlation through the multivariate phylogenetic latent liability model

Regression-based Approach for Testing the Association Between Multi-Region Haplotype Configuration and Complex Trait

Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees

Association Test Between Haplotypes and Longitudinal Traits in Complex Pedigrees.

Accelerating Bayesian inference of dependency between complex biological traits

Introducing Gaussian covariance graph models in genome-wide prediction

Mapping the genetic architecture of complex traits in experimental populations.

Mixed Linear Model Approaches for Analyzing Genetic Models of Complex Quantitative Traits

Multiple Quantitative Trait Analysis Using Bayesian Networks

Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression

Modelling correlated marker effects in genome-wide prediction via Gaussian concentration graph models

Bayesian structural equation models for inferring relationships between phenotypes: a review of methodology, identifiability, and applications.

A Unifying Model for the Analysis of Phenotypic, Genetic and Geographic Data

Optimal Estimation Of Genetic Relatedness In High-Dimensional Linear Models

Linking phenotypic and genotypic variation: a relaxed phylogenetic approach using the probabilistic programming language Stan

Using Genetic Distance to Infer the Accuracy of Genomic Prediction