Genetic association study of Alzheimer’s disease through whole genome tiling analysis

Jingxuan Bao,Brian N Lee,Sarah Wait Zaranek,Matthew Lee,Manu Shivakumar,Junhao Wen,Jiong Chen,Zixuan Wen,Yang Shu,Heng Huang,Andrew J. Saykin,Paul M. Thompson,Christos Davatzikos,Dokyoon Kim,Alexander Wait Zaranek,Li Shen
DOI: https://doi.org/10.1002/alz.063481
2023-01-01
Abstract:Background Numerous GWAS studies of Alzheimer’s disease (AD) have identified over 70 AD risk variants using SNP‐based genotyping or sequencing data. Recently, a new whole‐genome tiling (WGT) representation of whole‐genome sequencing (WGS) data has been proposed to enable an innovative definition of an individual’s genome; this WGT representation can support supervised and unsupervised machine learning. In this study, we perform a new AD GWAS study on the WGT representation of the ADNI WGS data. Methods The detailed description, genome tiling pipeline, and a publicly available example of WGT data are available at: https://curii.co/su92l‐j7d0g‐swtofxa2rct8495 . In our analysis, we first performed quality control, imputation, and one‐hot encoding of tile variants (Fig. 1). Then, for each genome tile, we used the likelihood ratio test to compare two logistic regression models to get a single p‐value, where a full model used the tile variants and covariates to predict disease status, and a null model used only covariates including age, sex, education, APOE4, and first 20 PCs. Participants included 1,504 subjects (1,032 cases and 472 controls). In comparison, set‐based GWAS analysis was performed using PLINK 1.9 on ADNI SNP‐based WGS data. Results 8,560,743 tiles passed the QC process and were included in our analysis. The likelihood ratio test yielded 35,582 significant tiles with Bonferroni correction. A set‐based GWAS comparative study among all significant tiles using SNP‐based WGS data identified 1,535 sets with at least one significant SNP variant. Among 1,535 sets, 1,066 sets passed uncorrected p≤0.05; 115 sets passed p≤0.005; and 15 sets passed p≤0.0005 (Fig. 2). Conclusions Our initial investigation of the tiling data shows that the WGT representation has promising power for identifying significant tiles that cannot be detected using the SNP representation. Complementary to the genotype values examined in traditional SNP analysis, the WGT analysis focuses on examining the haplotype variants within each tile and can capture the interaction pattern among SNPs within the haplotype. This initial AD GWAS study on WGT data demonstrates the promise of the tile representation for revealing novel genetic risk and protective factors in AD.
What problem does this paper attempt to address?