Abstract:Next-generation sequencing has led to many complex-trait rare-variant (RV) association studies. Although single-variant association analysis can be performed, it is grossly underpowered. Therefore, researchers have developed many RV association tests that aggregate multiple variant sites across a genetic region (e.g., gene), and test for the association between the trait and the aggregated genotype. After these aggregate tests detect an association, it is only possible to estimate the average genetic effect for a group of RVs. As a result of the "winner's curse," such an estimate can be biased. Although for common variants one can obtain unbiased estimates of genetic parameters by analyzing a replication sample, for RVs it is desirable to obtain unbiased genetic estimates for the study where the association is identified. This is because there can be substantial heterogeneity of RV sites and frequencies even among closely related populations. In order to obtain an unbiased estimate for aggregated RV analysis, we developed bootstrap-sample-split algorithms to reduce the bias of the winner's curse. The unbiased estimates are greatly important for understanding the population-specific contribution of RVs to the heritability of complex traits. We also demonstrate both theoretically and via simulations that for aggregate RV analysis the genetic variance for a gene or region will always be underestimated, sometimes substantially, because of the presence of noncausal variants or because of the presence of causal variants with effects of different magnitudes or directions. Therefore, even if RVs play a major role in the complex-trait etiologies, a portion of the heritability will remain missing, and the contribution of RVs to the complex-trait etiologies will be underestimated.

The Value of Statistical or Bioinformatics Annotation for Rare Variant Association with Quantitative Trait.

Meta-analysis of Gene-Level Tests for Rare Variant Association.

Identifying rare variants using a Bayesian regression approach

Integrative Analysis of Sequencing and Array Genotype Data for Discovering Disease Associations with Rare Mutations

Testing rare variants for association with diseases: a Bayesian marker selection approach.

A Variational Bayes Discrete Mixture Test for Rare Variant Association

Approach of Fusing Multiple Tests to Analyzing Rare Genetic Variants

Methods for Association Analysis and Meta‐Analysis of Rare Variants in Families

Detecting functional rare variants by collapsing and incorporating functional annotation in Genetic Analysis Workshop 17 mini-exome data

Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results

Association analysis of rare variants with quantitative trait based on minimum P-value

Assessing association between protein truncating variants and quantitative traits

A LASSO-based Approach to Analyzing Rare Variants in Genetic Association Studies

A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions

Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits.

Rare Variants Analysis by Risk-Based Variable-Threshold Method

Association Analysis and Meta-Analysis of Multi-Allelic Variants for Large-Scale Sequence Data

Improved Detection of Rare Genetic Variants for Diseases

Extending Rare-Variant Testing Strategies: Analysis of Noncoding Sequence and Imputed Genotypes

Meta-Analysis of Gene Level Association Tests

Estimating Genetic Effects and Quantifying Missing Heritability Explained by Identified Rare-Variant Associations