Abstract:Background Next-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. However, the current paradigm for genome-wide association studies (GWAS) is to catalogue and genotype common variants (5% < MAF). The methods and study design for testing the association of low frequency (0.5% < MAF ≤ 5%) and rare variation (MAF ≤ 0.5%) have not been thoroughly investigated. The 1000 Genomes Project represents one such endeavour to characterize the human genetic variation pattern at the MAF = 1% level as a foundation for association studies. In this report, we explore different strategies and study designs for the near future GWAS in the post-era, based on both low coverage pilot data and exon pilot data in 1000 Genomes Project. Results We investigated the linkage disequilibrium (LD) pattern among common and low frequency SNPs and its implication for association studies. We found that the LD between low frequency alleles and low frequency alleles, and low frequency alleles and common alleles are much weaker than the LD between common and common alleles. We examined various tagging designs with and without statistical imputation approaches and compare their power against de novo resequencing in mapping causal variants under various disease models. We used the low coverage pilot data which contain ~14 M SNPs as a hypothetical genotype-array platform (Pilot 14 M) to interrogate its impact on the selection of tag SNPs, mapping coverage and power of association tests. We found that even after imputation we still observed 45.4% of low frequency SNPs which were untaggable and only 67.7% of the low frequency variation was covered by the Pilot 14 M array. Conclusions This suggested GWAS based on SNP arrays would be ill-suited for association studies of low frequency variation.

Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.

[A consensus on the standardization of the next generation sequencing process for the diagnosis of genetic diseases (3) - Data analysis].

Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

Improving population-specific allele frequency estimates by adapting supplemental data: an empirical Bayes approach

Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study.

Identification Of Sequence Variants In Genetic Disease-Causing Genes Using Targeted Next-Generation Sequencing

Faust, D. (Ed.). (2012). Coping with Psychiatric and Psychological Testimony (6th ed.). New York, NY: Oxford University Press. ISBN 978-0-19-517411-3, 1121 pp

Power analysis and sample size estimation for sequence-based association studies

Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies

Improving sequence-based genotype calls with linkage disequilibrium and pedigree information

Implication of Next-Generation Sequencing on Association Studies

Testing Linkage Disequilibrium from Pooled DNA: A Contingency Table Perspective

Optimization Methods for Genotype Data Analysis in Epidemiological Studies

Too many needles in this haystack: algorithms for the analysis of next generation sequence data

Evaluation of a semiautomatic cell harvester filtration for the determination of beta-adrenoceptors in human mononuclear leukocytes.

SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests

Genome-wide association studies: theoretical and practical concerns

Hidden Biases in Germline Structural Variant Detection.

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

[Construction of eukaryotic expression vector encoding ACRBP and its expression in hepatocarcinoma cells].

Effect of Pooling Samples on the Efficiency of Comparative Studies Using Microarrays.