Test for the statistical significance of a treatment effect in the presence of hidden sub-populations

Bikram Karmakar,Kumaresh Dhara,Kushal Kumar Dey,Analabha Basu,Anil Ghosh
DOI: https://doi.org/10.48550/arXiv.1211.0032
2012-10-31
Computation
Abstract:For testing the statistical significance of a treatment effect, we usually compare between two parts of a population, one is exposed to the treatment, and the other is not exposed to it. Standard parametric and nonparametric two-sample tests are often used for this comparison. But direct applications of these tests can yield misleading results, especially when the population has some hidden sub-populations, and the impact of this sub-population difference on the study variables dominates the treatment effect. This problem becomes more evident if these subpopulations have widely different proportions of representatives in the samples taken from these two parts, which are often referred to as the treatment group and the control group. In this article, we make an attempt to overcome this problem. Our propose methods use suitable clustering algorithms to find the hidden sub-populations and then eliminate the sub-population effect by using suitable transformations. Standard two-sample tests, when they are applied on the transformed data, yield better results. Some simulated and real data sets are analyzed to show the utility of the proposed methods.
What problem does this paper attempt to address?