Abstract:Genome-wide association studies have identified hundreds of genetic variants associated with complex diseases although most variants identified so far explain only a small proportion of heritability, suggesting that rare variants are responsible for missing heritability. Identification of rare variants through large-scale resequencing becomes increasing important but still prohibitively expensive despite the rapid decline in the sequencing costs. Nevertheless, group testing based overlapping pool sequencing in which pooled rather than individual samples are sequenced will greatly reduces the efforts of sample preparation as well as the costs to screen for rare variants. Here, we proposed an overlapping pool sequencing to screen rare variants with optimal sequencing depth and a corresponding cost model. We formulated a model to compute the optimal depth for sufficient observations of variants in pooled sequencing. Utilizing shifted transversal design algorithm, appropriate parameters for overlapping pool sequencing could be selected to minimize cost and guarantee accuracy. Due to the mixing constraint and high depth for pooled sequencing, results showed that it was more cost-effective to divide a large population into smaller blocks which were tested using optimized strategies independently. Finally, we conducted an experiment to screen variant carriers with frequency equaled 1%. With simulated pools and publicly available human exome sequencing data, the experiment achieved 99.93% accuracy. Utilizing overlapping pool sequencing, the cost for screening variant carriers with frequency equaled 1% in 200 diploid individuals dropped to at least 66% at which target sequencing region was set to 30 Mb.

Efficient Utilization of Rare Variants for Detection of Disease-Related Genomic Regions

Improved Detection of Rare Genetic Variants for Diseases

[Family-Based Association Tests for Rare Variants].

Novel Association Strategy with Copy Number Variation for Identifying New Risk Loci of Human Diseases

Testing rare variants for association with diseases: a Bayesian marker selection approach.

Integrative Analysis of Sequencing and Array Genotype Data for Discovering Disease Associations with Rare Mutations

Testing Genetic Association with Rare Variants in Admixed Populations.

Population structure analysis using rare and common functional variants

Rare coding variant analysis for human diseases across biobanks and ancestries

Detecting multiple variants associated with disease based on sequencing data of case–parent trios

Rare Variants Analysis by Risk-Based Variable-Threshold Method

A Variational Bayes Discrete Mixture Test for Rare Variant Association

Detecting functional rare variants by collapsing and incorporating functional annotation in Genetic Analysis Workshop 17 mini-exome data

Block-based Association Tests for Rare Variants Using Kullback–Leibler Divergence

Identifying rare variants using a Bayesian regression approach

Improving Power for Robust Trans-Ethnic Meta-Analysis of Rare and Low-Frequency Variants with A Partitioning Approach

A Probabilistic Method for Identifying Rare Variants Underlying Complex Traits

Association Analysis and Meta-Analysis of Multi-Allelic Variants for Large-Scale Sequence Data

Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing.

Approach of Fusing Multiple Tests to Analyzing Rare Genetic Variants

Methods for Association Analysis and Meta‐Analysis of Rare Variants in Families