Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification

Suja A. Alex,J. Jesu Vedha Nayahi,Sanaa Kaddoura
DOI: https://doi.org/10.1016/j.asoc.2024.111491
IF: 8.7
2024-05-01
Applied Soft Computing
Abstract:Imbalanced data classification presents a challenge in machine learning, inducing biased model learning. Moreover, data dimensionality poses another challenge as it highly impacts classifier performance. This paper proposes a new deep-learning method that combines feature selection with oversampling to address these challenges. The proposed approach, GA-SMOTE-DCNN, integrates a genetic algorithm (GA) for feature selection, SMOTE for oversampling, and a deep 1D-convolutional neural network (DCNN) for classification. This study reveals that pre-splitting the data into training and testing sets before applying SMOTE results in higher accuracy, showing an improvement in accuracy ranging between 1.94% and 3.98% compared to post-SMOTE splitting for each dataset. This method achieved accuracy rates of 86.81% for the Balance Scale dataset, 86.15% for the Oil Spill dataset, 89.21% for the Yeast dataset, 91.32% for the Mammography dataset, 88.23% for the Australian credit dataset, and 89.53% for the German Credit dataset when compared with benchmark methods, underscoring its significance in tackling high-dimensional and imbalanced data classification problems. This method demonstrates scalability in effectively addressing challenges associated with high-dimensional and imbalanced data classification across various domains.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?