The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait

Ziang Zhang,Lei Sun
DOI: https://doi.org/10.48550/arXiv.2203.15641
2022-10-04
Abstract:Accurate power and sample size estimation are crucial to the design and analysis of genetic association studies. When analyzing a binary trait via logistic regression, important covariates such as age and sex are typically included in the model. However, their effects are rarely properly considered in power or sample size computation during study planning. Unlike when analyzing a continuous trait, the power of association testing between a binary trait and a genetic variant depends, explicitly, on covariate effects, even under the assumption of gene-environment independence. Earlier work recognizes this hidden factor but implemented methods are not flexible. We thus propose and implement a generalized method for estimating power and sample size for (discovery or replication) association studies of binary traits that a) accommodates different types of non-genetic covariates E, b) deals with different types of G-E relationships, and c) is computationally efficient. Extensive simulation studies show that the proposed method is accurate and computationally efficient for both prospective and retrospective sampling designs with various covariate structures. A proof-of-principle application focused on the understudied African sample in the UK Biobank data. Results show that, in contrast to studying the continuous blood pressure trait, when analyzing the binary hypertension trait ignoring covariate effects of age and sex leads to overestimated power and underestimated replication sample size.
Methodology,Applications,Computation
What problem does this paper attempt to address?