Two-Stage Testing in a high dimensional setting

Marianne A Jonker,Luc van Schijndel,Eric Cator
2024-06-25
Abstract:In a high dimensional regression setting in which the number of variables ($p$) is much larger than the sample size ($n$), the number of possible two-way interactions between the variables is immense. If the number of variables is in the order of one million, which is usually the case in e.g., genetics, the number of two-way interactions is of the order one million squared. In the pursuit of detecting two-way interactions, testing all pairs for interactions one-by-one is computational unfeasible and the multiple testing correction will be severe. In this paper we describe a two-stage testing procedure consisting of a screening and an evaluation stage. It is proven that, under some assumptions, the tests-statistics in the two stages are asymptotically independent. As a result, multiplicity correction in the second stage is only needed for the number of statistical tests that are actually performed in that stage. This increases the power of the testing procedure. Also, since the testing procedure in the first stage is computational simple, the computational burden is lowered. Simulations have been performed for multiple settings and regression models (generalized linear models and Cox PH model) to study the performance of the two-stage testing procedure. The results show type I error control and an increase in power compared to the procedure in which the pairs are tested one-by-one.
Methodology,Statistics Theory
What problem does this paper attempt to address?