Abstract:Abstract Motivation Genetics hold great promise to precision medicine by tailoring treatment to the individual patient based on their genetic profiles. Toward this goal, many large-scale genome-wide association studies (GWAS) have been performed in the last decade to identify genetic variants associated with various traits and diseases. They have successfully identified tens of thousands of disease-related variants. However they have explained only a small proportion of the overall trait heritability for most traits and are of very limited clinical use. This is partly owing to the small effect sizes of most genetic variants, and the common practice of testing association between one trait and one genetic variant at a time in most GWAS, even when multiple related traits are often measured for each individual. Increasing evidence suggests that many genetic variants can influence multiple traits simultaneously, and we can gain more power by testing association of multiple traits simultaneously. It is appealing to develop novel multi-trait association test methods that need only GWAS summary data, since it is generally very hard to access the individual-level GWAS phenotype and genotype data. Results Many existing GWAS summary data-based association test methods have relied on ad hoc approach or crude Monte Carlo approximation. In this article, we develop rigorous statistical methods for efficient and powerful multi-trait association test. We develop robust and efficient methods to accurately estimate the marginal trait correlation matrix using only GWAS summary data. We construct the principal component (PC)-based association test from the summary statistics. PC-based test has optimal power when the underlying multi-trait signal can be captured by the first PC, and otherwise it will have suboptimal performance. We develop an adaptive test by optimally weighting the PC-based test and the omnibus chi-square test to achieve robust performance under various scenarios. We develop efficient numerical algorithms to compute the analytical P-values for all the proposed tests without the need of Monte Carlo sampling. We illustrate the utility of proposed methods through application to the GWAS meta-analysis summary data for multiple lipids and glycemic traits. We identify multiple novel loci that were missed by individual trait-based association test. Availability and implementation All the proposed methods are implemented in an R package available at http://www.github.com/baolinwu/MTAR. The developed R programs are extremely efficient: it takes less than 2 min to compute the list of genome-wide significant single nucleotide polymorphisms (SNPs) for all proposed multi-trait tests for the lipids GWAS summary data with 2.5 million SNPs on a single Linux desktop. Supplementary information Supplementary data are available at Bioinformatics online.

A generalized linear mixed model association tool for biobank-scale data

FastBiCmrMLM: a fast and powerful compressed variance component mixed logistic model for big genomic case-control genome-wide association study

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data

Genome-wide interaction-based association analysis identified multiple new susceptibility Loci for common diseases.

Improving GWAS discovery and genomic prediction accuracy in biobank data

A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank

A Fast and Powerful Empirical Bayes Method for Genome-Wide Association Studies.

A SUPER powerful method for genome wide association study.

Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects

Hierarchical Generalized Linear Mixed Model for Genome-wide Association Analysis

A Fast Algorithm for Bayesian Multi-Locus Model in Genome-Wide Association Studies

UK BioCoin: Swift Trait-Specific Summary Statistics Regression for UK Biobank

Integrate multiple traits to detect novel trait–gene association using GWAS summary data with an adaptive test approach

Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies

Multi-trait genome-wide analyses of the brain imaging phenotypes in UK Biobank

Genome-wide discovery for biomarkers using quantile regression at biobank scale

Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank

A comprehensive analysis comparing linear and generalized linear models in detecting adaptive SNPs

A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank

Genetic association studies using disease liabilities from deep neural networks

Uncovering hidden gene-trait patterns through biclustering analysis of the UK Biobank