Abstract:With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for similar to 860,000 SNPs in 90 grandparents and parents are complemented by genotypes for similar to 6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free.

A Method for Predicting Allelic Variants of Single Nucleotide Polymorphisms

Current limitations of SNP data from the public domain for studies of complex disorders: a test for ten candidate genes for obesity and osteoporosis

Regsnps: a Strategy for Prioritizing Regulatory Single Nucleotide Substitutions.

A Comparison on Predicting Functional Impact of Genomic Variants.

A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data

Family-Based Association Tests for Genomewide Association Scans

Improved Detection of Rare Genetic Variants for Diseases

A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing

Prediction of Functional Regulatory SNPs in Monogenic and Complex Disease.

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

Large-Scale Validation of Single Nucleotide Polymorphisms in Gene Regions

Probability Theory-Based SNP Association Study Method for Identifying Susceptibility Loci and Genetic Disease Models in Human Case-Control Data

An Integrated Framework for Analysis and Prediction of Impact of Single Nucleotide Polymorphism Associated with Human Diseases

A Novel Method for in Silico Identification of Regulatory SNPs in Human Genome.

Polymorphisms Affecting Gene Transcription and Mrna Processing in Pharmacogenetic Candidate Genes: Detection Through Allelic Expression Imbalance in Human Target Tissues

Prediction and functional analysis of single nucleotide polymorphisms.

Assessing the function of genetic variants in candidate gene association studies

A Novel Statistical Method for Interpreting the Pathogenicity of Rare Variants

A general approach to single-nucleotide polymorphism discovery

Prediction and Functional Analysis of Single Nucleotide Polymorphisms

A computational method for identification of disease-associated non-coding SNPs in human genome