kTWAS: Integrating kernel-machine with transcriptome-wide association studies improves statistical power and reveals novel genes

Chen Cao,Devin Kwok,Shannon Edie,Qing Li,Bowei Ding,Pathum Kossinna,Simone Campbell,Jingjing Wu,Matthew Greenberg,Quan Long
DOI: https://doi.org/10.1101/2020.06.29.177121
2020-06-29
Abstract:Abstract The power of genotype-phenotype association mapping studies increases greatly when contributions from multiple variants in a focal region are meaningfully aggregated. Currently, there are two popular categories of variant aggregation methods. Transcriptome-wide association studies (TWAS) represent a category of emerging methods that select variants based on their effect on gene expressions, providing pretrained linear combinations of variants for downstream association mapping. In contrast, kernel methods such as SKAT model genotypic and phenotypic variance using various kernel functions that capture genetic similarity between subjects, allowing non-linear effects to be included. From the perspective of machine learning, these two methods cover two complementary aspects of feature engineering: feature selection/pruning, and feature modeling. Thus far, no thorough comparison has been made between these categories, and no methods exist which incorporate the advantages of TWAS and kernel-based methods. In this work we developed a novel method called kTWAS that applies TWAS-like feature selection to a SKAT-like kernel association test, combining the strengths of both approaches. Through extensive simulations, we demonstrate that kTWAS has higher power than TWAS and multiple SKAT-based protocols, and we identify novel disease-associated genes in WTCCC genotyping array data and MSSNG (Autism) sequence data. The source code for kTWAS and our simulations are available in our GitHub repository ( https://github.com/theLongLab/kTWAS ).
What problem does this paper attempt to address?