Sample pooling inflates error rates in between-sample comparisons: an empirical investigation of the statistical properties of count-based data

Megan N. Taylor,Nic M. Vega
DOI: https://doi.org/10.1101/2022.07.25.501406
2024-02-15
Abstract:Heterogeneity is ubiquitous across individuals in biological data, and sample batching, a form of biological averaging, inevitably loses information about this heterogeneity. The consequences for inference from biologically averaged data are frequently opaque, particularly when the underlying populations are non-normal. Here we investigate a case where biological averaging is common - count-based measurement of bacterial load in individual - to empirically determine the consequences of batching. We find that both central measures and measures of variation on individual-based data contain biologically relevant information that is useful for distinguishing between groups, and that batch-based inference readily produces both false positive and false negative results in these comparisons. These results support the use of individual rather than batched samples when possible, illustrate the importance of understanding distributions across individuals within a sample frame, and indicate the need to consider effect size when drawing conclusions from biologically averaged data.
Microbiology
What problem does this paper attempt to address?