An integrative approach to prioritize candidate causal genes for complex traits in cattle
Mohammad Ghoreishifar,Iona M. Macleod,Amanda J. Chamberlain,Zhiqian Liu,Thomas J. Lopdell,Mathew D. Littlejohn,Ruidong Xiang,Jennie E. Pryce,Michael E. Goddard
DOI: https://doi.org/10.1101/2024.11.11.622912
2024-11-12
Abstract:Genome-wide association studies (GWAS) have identified many quantitative trait loci (QTL) associated with complex traits, predominantly in non-coding regions, posing challenges in pinpointing the causal variants and their target genes. Three types of evidence can help identify the gene through which QTL act: (1) proximity to the most significant GWAS variant, (2) correlation of gene expression with the trait, and (3) the gene's physiological role in the trait. However, there is still uncertainty in the success of these methods in identifying the correct genes. Here we test the ability of these methods in a comparatively simple series of traits associated with the concentration of polar lipids in milk.
We conducted single-trait GWAS for ~14 million imputed variants and 56 individual milk polar lipid (PL) phenotypes in 336 cows. A meta-analysis of multi-trait GWAS identified 10,063 significant SNPs at FDR less than or equal to 10% (P les than or equal to 7.15E-5). Transcriptome data from blood (~12.5K genes, 143 cows) and mammary tissue (~12.2K genes, 169 cows) were analysed using the genetic score omics regression (GSOR) method. This method links observed gene expression to genetically predicted phenotypes and was used to find associations between gene expression and 56 PL phenotypes. GSOR identified 2,186 genes in blood and 1,404 in mammary tissue associated with at least one PL phenotype (FDR less than or equal to 1%). We partitioned the genome into non-overlapping windows of 100 Kb to test for overlap between GSOR-identified genes and GWAS signals. We found a significant overlap between these two datasets, indicating GSOR significant genes were more likely to be located within 100 Kb windows that have GWAS signals compared to those without (P = 0.01; odds ratio = 1.47). These windows included 70 significant genes expressed in mammary tissue and 95 in blood. Compared to all expressed genes in each tissue, these genes were enriched for lipid metabolism gene ontology (GO). That is, 7 of the 70 significant mammary transcriptome genes (P < 0.01; odds ratio = 3.98) and 5 of the 95 significant blood genes (P < 0.10; odds ratio = 2.24) were involved in lipid metabolism GO. The candidate causal genes include DGAT1, ACSM5, SERINC5, ABHD3, CYP2U1, PIGL, ARV1, SMPD5, and NPC2, with some overlap between the two tissues.
The overlap between GWAS, GSOR, and GO analyses suggests that together these methods can identify genes mediating QTL, though their power remains limited, as reflected by modest odds ratios. Larger sample sizes would enhance the power of these analyses, but issues like linkage disequilibrium would remain.
Genomics