An evolutionary algorithm‐based classification method for high‐dimensional imbalanced mixed data with missing information

Yi Liu,Gengsong Li,Qibin Zheng,Guoli Yang,Kun Liu,Wei Qin
DOI: https://doi.org/10.1049/ell2.70052
2024-10-16
Electronics Letters
Abstract:This manuscript proposes an imputation and classification method based on evolutionary algorithm called PSOHIM to solve the challenging classification problem of high‐dimensional mixed‐variables missing data. In PSOHIM, a two‐stage multiple feature selection strategy is introduced to eliminate high dimensional issues, propose a mixed attribute imputation method to generate different imputation models for continuous and discrete features, and utilize quantum oversampling to sample instances to balance data. Besides, PSOHIM adopts PSO to optimize the parameters of mixed attribute imputation and quantum oversampling parts to obtain better classification models. The data scale keeps growing by leaps and the majority of it is high‐dimensional imbalanced data, which is hard to classify. Data missing often happens in software which further aggravates the difficulty of classifying the data. In order to resolve high‐dimensional imbalanced mixed‐variables missing data classification problem, a novel method based on particle swarm optimization is developed. It has three original components including multiple feature selection, mixed attribute imputation, and quantum oversampling. Multiple feature selection uses a two‐stage strategy to obtain stable relevant features. Mixed attribute imputation separates features into continuous and discrete features and fills missing values with different models. Quantum oversampling chooses instances to balance data based on the quantum operator. Furthermore, particle swarm optimization is employed to optimize the parameters of the components to obtain preferable classification results. Six representative classification datasets, six typical algorithms, and four measures are taken to conduct exhaust experiments, and results indicate that the proposed method is superior to the comparison algorithms.
engineering, electrical & electronic
What problem does this paper attempt to address?