Handling imbalanced medical datasets: review of a decade of research

Mabrouka Salmi,Dalia Atif,Diego Oliva,Ajith Abraham,Sebastian Ventura
DOI: https://doi.org/10.1007/s10462-024-10884-2
IF: 9.588
2024-09-04
Artificial Intelligence Review
Abstract:Machine learning and medical diagnostic studies often struggle with the issue of class imbalance in medical datasets, complicating accurate disease prediction and undermining diagnostic tools. Despite ongoing research efforts, specific characteristics of medical data frequently remain overlooked. This article comprehensively reviews advances in addressing imbalanced medical datasets over the past decade, offering a novel classification of approaches into preprocessing, learning levels, and combined techniques. We present a detailed evaluation of the medical datasets and metrics used, synthesizing the outcomes of previous research to reflect on the effectiveness of the methodologies despite methodological constraints. Our review identifies key research trends and offers speculative insights and research trajectories to enhance diagnostic performance. Additionally, we establish a consensus on best practices to mitigate persistent methodological issues, assisting the development of generalizable, reliable, and consistent results in medical diagnostics.
computer science, artificial intelligence
What problem does this paper attempt to address?