Abstract:Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer and cleft lip data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer disease risk.Joint analysis of association between the outcome and a group of SNPs within a genetic region is increasingly recognized to complement single-SNP analysis and shed light on the underlying molecular mechanisms. However, the correlation among GWAS association results calls for specifically tailored statistical methods. Here we propose DOT (Decorrelation by Orthogonal Transformation) method that can efficiently combine evidence of association over different SNPs and genes within a pathway without access to the original genotypic data. DOT is fast, does not rely on a permutation algorithm, and is often dramatically more powerful than other popular methods, such as VEGAS and the recently proposed ACAT. We believe that DOT will become a useful addition to the toolbox of methods based on the summary statistics for the GWAS community.

Detecting Local Genetic Correlations with Scan Statistics

Statistical examination of shared loci in neuropsychiatric diseases using genome-wide association study summary statistics

Simultaneous Detection of Signal Regions Using Quadratic Scan Statistics With Applications in Whole Genome Association Studies

Simultaneous Detection of Signal Regions Using Quadratic Scan Statistics With Applications to Whole Genome Association Studies

Identifying Disease-Associated Snp Clusters Via Contiguous Outlier Detection

A Pattern Discovery-Based Method for Detecting Multi-Locus Genetic Association

A tight glycemic control initiative in a surgical intensive care unit and hospitalwide.

Statistical Testing of Shared Genetic Control for Potentially Related Traits

A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits

Beyond guilty by association at scale: searching for causal variants on the basis of genome-wide summary statistics

Genealogy based trait association with LOCATER boosts power at loci with allelic heterogeneity

Fast Signal Region Detection with Application to Whole Genome Association Studies

A nonparametric test for association with multiple loci in the retrospective case-control study.

LLR: a Latent Low-Rank Approach to Colocalizing Genetic Risk Variants in Multiple GWAS.

Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies

Detection for Gene-Gene Co-Association Via Kernel Canonical Correlation Analysis

Evaluating Marginal Genetic Correlation of Associated Loci for Complex Diseases and Traits Between European and East Asian Populations.

Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics

Identification of Disease-Sensitive Brain Imaging Phenotypes and Genetic Factors Using GWAS Summary Statistics

A Robust and Efficient Statistical Method for Genetic Association Studies Using Case and Control Samples from Multiple Cohorts

DOT: Gene-set analysis by combining decorrelated association statistics