Constructing small sample datasets with game mixed sampling and improved genetic algorithm

Bailin Zhu,Hongliang Wang,Mi Fan
DOI: https://doi.org/10.1007/s11227-024-06263-x
IF: 3.3
2024-06-06
The Journal of Supercomputing
Abstract:The issue of categorizing imbalanced data is becoming increasingly prevalent. While existing methodologies have demonstrated notable advancements in handling imbalanced data, the challenges of extensive data size and low-quality data in the dataset persist. We propose an innovative hybrid approach that combines game mixed sampling and improved genetic algorithms to address the issues of excessive dataset size and low data quality in imbalanced data classification problems. Specifically, in the game hybrid sampling module, we identify the optimal hybrid sampling method and sampling ratio for the current dataset through the game idea, intending to obtain a diverse dataset to ensure comprehensive coverage of the dataset. Additionally, in the module on improving genetic algorithms, we integrate the classifier group, encode the performance metrics of the sampled data, and perform a comprehensive evaluation of the fitness of each data point. Preserve population data of many excellent individuals through selection operations. The population data is subjected to crossover and mutation operations to explore the search space, and the minimum stable population size is determined by sliding standard deviation. In the real world, where credit card fraud data are highly imbalanced, our combined approach achieves small dataset sizes and high evaluation indices, outperforms existing methods, and demonstrates the effectiveness of game mixed sampling and improved genetic algorithms.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?