Multimodal distribution and its impact on the accurate assessment of spermatozoa morphological data: Lessons from machine learning

D Stefanovski,M Schulze,G C Althouse
DOI: https://doi.org/10.1016/j.anireprosci.2024.107564
Abstract:Objective assessment of sperm morphology is an essential component for assessing ejaculate quality. Due to economic limitations, investigators often divert to conducting observational studies instead of experimental ones, which provide the strongest statistical power, yielding more heterogeneous data regardless of the number of data sources (barns/farms). Using such data inevitably leads to higher variances of estimates, which negatively impacts the statistical power of a study. In this article, we describe a statistical methodology called finite mixture modeling (FMM), which, based on the supplied data and assumed number of sub-classes, classifies the data into two or more homogeneous types of distributions and determines their fractional size relative to the entire cohort. The goal is to use statistical methods that will confound the variance of the sample. A figure from a previous publication was used to generate simulated data (n=1559) on the cytoplasmic droplet rate. We identified that a bi-modal distribution with two latent classes best described the simulated data. Post-hoc estimation showed that about 80 % of observations belonged to latent class 1, with 20 % in latent class 2. The FMM methodology identified a cutoff point of 8.7 %. Finally, when estimating the standard error for the total cohort, the FMM methodology yielded a 40 % reduction in the standard error compared to standard methodologies. In conclusion, here we show that FMM successfully confounded the variance of the data and, as such, yielded lower estimates of the variance than standard methodologies, increasing the statistical power of the cohort.
What problem does this paper attempt to address?