Abstract:Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8 , without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS.

A Robust and Powerful Set-Valued Approach to Rare Variant Association Analyses of Secondary Traits in Case-Control Sequencing Studies.

Powerful Rare‐Variant Association Analysis of Secondary Phenotypes

Identifying rare variants using a Bayesian regression approach

Evaluating rare variants under two-stage design

Diagnostic and Clinical Utility of Whole Genome Sequencing in a Cohort of Undiagnosed Chinese Families with Rare Diseases

A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions

Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies

A Novel Method, the Variant Impact on Linkage Effect Test (VIOLET), Leads to Improved Identification of Causal Variants in Linkage Regions

Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores

A Bayes Factor Approach with Informative Prior for Rare Genetic Variant Analysis from Next Generation Sequencing Data

Rare variant association tests for ancestry-matched case-control data based on conditional logistic regression

A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies

Deviation from baseline mutation burden provides powerful and robust rare-variants association test for complex diseases

SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests

A unified method for rare variant analysis of gene-environment interactions

Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data

Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results

A Robust and Efficient Statistical Method for Genetic Association Studies Using Case and Control Samples from Multiple Cohorts

Power analysis and sample size estimation for sequence-based association studies

A nonparametric test for association with multiple loci in the retrospective case-control study.

Inflated expectations: Rare-variant association analysis using public controls