WMW-A: Rank-based two-sample independent test for smallsample sizes through an auxiliary sample

Yin Guo,Limin Li
DOI: https://doi.org/10.1101/2021.06.24.449844
2021-01-01
bioRxiv
Abstract:Two-sample independent test methods are widely used in case-control studies to identify significant changes or differences, for example, to identify key pathogenic genes by comparing the gene expression levels in normal and disease cells. However, due to the high cost of data collection or labelling, many studies face the small sample problem, for which the traditional two-sample test methods often lose power. We propose a novel rank-based nonparametric test method WMW-A for small sample problem by introducing a three-sample statistic through another auxiliary sample. By combining the case, control and auxiliary samples together, we construct a three-sample WMW-A statistic based on the gap between the average ranks of the case and control samples in the combined samples. By assuming that the auxiliary sample follows a mixed distribution of the case and control populations, we analyze the theoretical properties of the WMW-A statistic and approximate the theoretical power. The extensive simulation experiments and real applications on microarray gene expression data sets show the WMW-A test could significantly improve the test power for two-sample problem with small sample sizes, by either available unlabelled auxiliary data or generated auxiliary data. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?