Incorporation of Sparsity Information in Large-scale Multiple Two-sample $t$ Tests

Weidong Liu
DOI: https://doi.org/10.48550/arXiv.1410.4282
2014-10-16
Methodology
Abstract:Large-scale multiple two-sample {\em Student}'s $t$ testing problems often arise from the statistical analysis of scientific data. To detect components with different values between two mean vectors, a well-known procedure is to apply the Benjamini and Hochberg (B-H) method and two-sample {\em Student}'s $t$ statistics to control the false discovery rate (FDR). In many applications, mean vectors are expected to be sparse or asymptotically sparse. When dealing with such type of data, {\em can we gain more power than the standard procedure such as the B-H method with Student's $t$ statistics while keeping the FDR under control?} The answer is positive. By exploiting the possible sparsity information in mean vectors, we present an uncorrelated screening-based (US) FDR control procedure, which is shown to be more powerful than the B-H method. The US testing procedure depends on a novel construction of screening statistics, which are asymptotically uncorrelated with two-sample {\em Student}'s $t$ statistics. The US testing procedure is different from some existing {\em testing following screening} methods (Reiner, et al., 2007; Yekutieli, 2008) in which independence between screening and testing is crucial to control the FDR, while the independence often requires additional data or splitting of samples. An inappropriate splitting of samples may result in a loss rather than an improvement of statistical power. Instead, the uncorrelated screening US is based on the original data and does not need to split the samples. Theoretical results show that the US testing procedure controls the desired FDR asymptotically. Numerical studies are conducted and indicate that the proposed procedure works quite well.
What problem does this paper attempt to address?