Processing imbalanced medical data at the data level with assisted-reproduction data as an example

Junliang Zhu,Shaowei Pu,Jiaji He,Dongchao Su,Weijie Cai,Xueying Xu and Hongbo Liu
DOI: https://doi.org/10.1186/s13040-024-00384-y
2024-09-05
BioData Mining
Abstract:Data imbalance is a pervasive issue in medical data mining, often leading to biased and unreliable predictive models. This study aims to address the urgent need for effective strategies to mitigate the impact of data imbalance on classification models. We focus on quantifying the effects of different imbalance degrees and sample sizes on model performance, identifying optimal cut-off values, and evaluating the efficacy of various methods to enhance model accuracy in highly imbalanced and small sample size scenarios.
mathematical & computational biology
What problem does this paper attempt to address?