Simultaneous Analysis of Common and Rare Variants in Complex Traits: Application to SNPs (Scarvasnp)

Guanjie Chen,Ao Yuan,Yanxun Zhou,Amy R. Bentley,Jie Zhou,Weiping Chen,Daniel Shriner,Adebowale Adeyemo,Charles N. Rotimi
DOI: https://doi.org/10.4137/bbi.s9966
2012-01-01
Bioinformatics and Biology Insights
Abstract:Advances in technology and reduced costs are facilitating large-scale sequencing of genes and exomes as well as entire genomes. Recently, we described an approach based on haplotypes called SCARVA 1 that enables the simultaneous analysis of the association between rare and common variants in disease etiology. Here, we describe an extension of SCARVA that evaluates individual markers instead of haplotypes. This modified method (SCARVAsnp) is implemented in four stages. First, all common variants in a pre-specified region (eg, gene) are evaluated individually. Second, a union procedure is used to combined all rare variants (RVs) in the index region, and the ratio of the log likelihood with one RV excluded to the log likelihood of a model with all the collapsed RVs is calculated. On the basis of previously-reported simulation studies, 1 a likelihood ratio ≥ 1.3 is considered statistically significant. Third, the direction of the association of the removed RV is determined by evaluating the change in λ values with the inclusion and exclusion of that RV. Lastly, significant common and rare variants, along with covariates, are included in a final regression model to evaluate the association between the trait and variants in that region. We apply simulated and real data sets to show that the method is simple to use, computationally effcient, and that it can accurately identify both common and rare risk variants. This method overcomes several limitations of existing methods. For example, SCARVAsnp limits loss of statistical power by not including variants that are not associated with the trait of interest in the final model. Also, SCARVAsnp takes into consideration the direction of association by effectively modelling positively and negatively associated variants.
What problem does this paper attempt to address?