Abstract:Recent studies have shown that the human genome has a haplotype block structure, such that it can be divided into discrete blocks of limited haplotype diversity. In each block, a small fraction of single-nucleotide polymorphisms (SNPs), referred to as "tag SNPs," can be used to distinguish a large fraction of the haplotypes. These tag SNPs can potentially be extremely useful for association studies, in that it may not be necessary to genotype all SNPs; however, this depends on how much power is lost. Here we develop a simulation study to quantitatively assess the power loss for a variety of study designs, including case-control designs and case-parental control designs. First, a number of data sets containing case-parental or case-control samples are generated on the basis of a disease model. Second, a small fraction of case and control individuals in each data set are genotyped at all the loci, and a dynamic programming algorithm is used to determine the haplotype blocks and the tag SNPs based on the genotypes of the sampled individuals. Third, the statistical power of tests was evaluated on the basis of three kinds of data: (1) all of the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, and (3) the same number of randomly chosen SNPs as the number of tag SNPs and the corresponding haplotypes. We study the power of different association tests with a variety of disease models and block-partitioning criteria. Our study indicates that the genotyping efforts can be significantly reduced by the tag SNPs, without much loss of power. Depending on the specific haplotype block-partitioning algorithm and the disease model, when the identified tag SNPs are only 25% of all the SNPs, the power is reduced by only 4%, on average, compared with a power loss of approximately 12% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. When the identified tag SNPs are approximately 14% of all the SNPs, the power is reduced by approximately 9%, compared with a power loss of approximately 21% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. Our study also indicates that haplotype-based analysis can be much more powerful than marker-by-marker analysis.

Alternative Methods for H1 Simulations in Genome Wide Association Studies

Non-subjective power analysis to detect G*E interactions in Genome-Wide Association Studies in presence of confounding factor

Principles for the Post-Gwas Functional Characterisation of Risk Loci

Using Alternative Definitions of Controls to Increase Statistical Power in GWAS

Family-Based Association Tests for Genomewide Association Scans

Power Estimation Of Multiple Snp Association Test Of Case-Control Study And Application

Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

Integrative Analysis of Sequencing and Array Genotype Data for Discovering Disease Associations with Rare Mutations

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes

Small-group originating model: Optimized individual-level GWAS simulation featured by SLiM and using open-access data

Univariate/Multivariate Genome-Wide Association Scans Using Data from Families and Unrelated Samples

Accurate cross-platform GWAS analysis via two-stage imputation

The Choice of Null Distributions for Detecting Gene-Gene Interactions in Genome-Wide Association Studies

Gains in Power for Exhaustive Analyses of Haplotypes Using Variable-Sized Sliding Window Strategy: a Comparison of Association-Mapping Strategies.

Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies

Extending Rare-Variant Testing Strategies: Analysis of Noncoding Sequence and Imputed Genotypes

An Efficient Method for Casual Snps Detection in Genome-Wide Case-Control Study

Haplotype Block Structure and Its Applications to Association Studies: Power and Study Designs

THE HAPLOTYPE LINKAGE DISEQUILIBRIUM TEST FOR GENOME-WIDE SCREENS: ITS POWER AND STUDY DESIGN

SPS: A Simulation Tool for Calculating Power of Set‐Based Genetic Association Tests

Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies