GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, Gaussian imputation, and TWAS analysis of cosmopolitan cohorts

Donghyung Lee,Silviu-Alin Bacanu
DOI: https://doi.org/10.1093/bioinformatics/btae203
IF: 5.8
2024-03-29
Bioinformatics
Abstract:Abstract Motivation As the availability of larger and more ethnically diverse reference panels grows, there is an increase in demand for ancestry-informed imputation of genome-wide association studies (GWAS), and other downstream analyses, e.g. fine-mapping. Performing such analyses at the genotype level is computationally challenging and necessitates, at best, a laborious process to access individual-level genotype and phenotype data. Summary-statistics-based tools, not requiring individual-level data, provide an efficient alternative that streamlines computational requirements and promotes open science by simplifying the re-analysis and downstream analysis of existing GWAS summary data. However, existing tools perform only disparate parts of needed analysis, have only command-line interfaces, and are difficult to extend/link by applied researchers. Results To address these challenges, we present Genome Analysis Using Summary Statistics (GAUSS)—a comprehensive and user-friendly R package designed to facilitate the re-analysis/downstream analysis of GWAS summary statistics. GAUSS offers an integrated toolkit for a range of functionalities, including (i) estimating ancestry proportion of study cohorts, (ii) calculating ancestry-informed linkage disequilibrium, (iii) imputing summary statistics of unobserved variants, (iv) conducting transcriptome-wide association studies, and (v) correcting for “Winner’s Curse” biases. Notably, GAUSS utilizes an expansive, multi-ethnic reference panel consisting of 32 953 genomes from 29 ethnic groups. This panel enhances the range and accuracy of imputable variants, including the ability to impute summary statistics of rarer variants. As a result, GAUSS elevates the quality and applicability of existing GWAS analyses without requiring access to subject-level genotypic and phenotypic information. Availability and implementation The GAUSS R package, complete with its source code, is readily accessible to the public via our GitHub repository at https://github.com/statsleelab/gauss. To further assist users, we provided illustrative use-case scenarios that are conveniently found at https://statsleelab.github.io/gauss/, along with a comprehensive user guide detailed in Supplementary Text S1.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The paper aims to address some key issues in genome-wide association studies (GWAS). Specifically: 1. **Ancestry Proportion Estimation in Multi-Ethnic Populations**: Accurately estimating ancestry proportions in multi-ethnic populations is crucial for subsequent analyses based on summary statistics. However, many traditional methods require individual-level genotype data, which is often unavailable due to privacy concerns. GAUSS estimates ancestry proportions in genetic association studies using only allele frequencies (AF) or association Z-scores. 2. **Ancestry-Informed Linkage Disequilibrium (LD) Calculation**: With the increasing diversity of ancestries in GWAS, accurately estimating ancestry-informed linkage disequilibrium becomes increasingly important. GAUSS provides the `computeLD()` function, which uses its extensive 33KG reference panel to calculate linkage disequilibrium values specific to different ethnic groups. 3. **Imputation of Summary Statistics for Unobserved SNPs**: Traditional genotype imputation methods require individual-level genotype data and are computationally intensive. GAUSS offers the `dist()` and `distmix()` functions to directly impute summary statistics (such as association Z-scores) for unobserved SNPs, applicable to both homogeneous and multi-ethnic populations. 4. **Transcriptome-Wide Association Studies (TWAS)**: GAUSS integrates advanced TWAS tools `jepeg()` and `jepegmix()` for homogeneous and heterogeneous populations to explore the functional links between genetic variation and complex traits. 5. **Correction for "Winner's Curse" Bias**: In genetic studies, "sub-threshold" association signals often have a greater impact on trait variation than statistically significant variants. GAUSS integrates the FIQT (False Discovery Rate Inverse Quantile Transformation) method to adjust for these biases. Through these features, GAUSS aims to simplify the re-analysis and downstream analysis of large GWAS summary statistics without the need for access to individual-level genotype and phenotype data, thereby promoting open science and enhancing the quality and applicability of existing GWAS analyses.