A computational model for sample dependence in hypothesis testing of genome data

Sunhee Kim,Chang-Yong Lee
DOI: https://doi.org/10.1007/s40042-024-01100-z
2024-05-31
Journal of the Korean Physical Society
Abstract:Statistical hypothesis testing assumes that the samples being analyzed are statistically independent, meaning that the occurrence of one sample does not affect the probability of the occurrence of another. In reality, however, this assumption may not always hold. When samples are not independent, it is important to consider their interdependence when interpreting the results of the hypothesis test. In this study, we address the issue of sample dependence in hypothesis testing by introducing the concept of adjusted sample size. This adjusted sample size provides additional information about the test results, which is particularly useful when samples exhibit dependence. To determine the adjusted sample size, we use the theory of networks to quantify sample dependence and model the variance of network density as a function of sample size. Our approach involves estimating the adjusted sample size by analyzing the variance of the network density, which reflects the degree of sample dependence. Through simulations, we demonstrate that dependent samples yield a higher variance in network density compared to independent samples, validating our method for estimating the adjusted sample size. Furthermore, we apply our proposed method to genomic datasets, estimating the adjusted sample size to effectively account for sample dependence in hypothesis testing. This guides interpreting test results and ensures more accurate data analysis.
physics, multidisciplinary
What problem does this paper attempt to address?