Hybrid approaches for handling imbalanced structured and unstructured data

Akshay M,Rishabh Manu,Rishika Raj,Suhas K T,Shobha K
DOI: https://doi.org/10.1007/s11042-024-20247-2
IF: 2.577
2024-10-01
Multimedia Tools and Applications
Abstract:With their limitless potential for highly accurate decision-making behaviors, machine learning algorithms have emerged to influence the world of information systems. Algorithms designed for structured and unstructured data sets do not produce good results if the data set has an uneven distribution of classes (imbalanced), this is because these algorithms are designed and developed to generalize from training data samples and to produce the most accessible hypothesis that fits the data. Uneven distribution of classes in the training stage gives a low precision on the minority class(es) but a high accuracy on the majority class(es) resulting in high-cost losses or even catastrophes. To overcome this inevitability, this work aims to propose hybrid balancing techniques. Proposed hybrid approaches have been empirically evaluated on both structured and unstructured imbalanced data sets. Empirical results have shown that the proposed hybrid approaches have outperformed state-of-the-art techniques by achieving improved mean recall rates of 40% and 20% on Pima diabetes and Cerebral stroke structured data sets respectively. The proposed method has also outperformed in balancing unstructured data by generating minority class data with the similarity (measured using Structural Similarity Index Measure (SSIM)) ranging between 25%-60% on MNSIT and Fashion-MNIST data sets. The resultant balanced dataset is further evaluated using conventional classification algorithms, where the outcomes on balanced dataset have shown a performance improvement of 10% compared to imbalanced data.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?