Variable-Size Cooperative Coevolutionary Particle Swarm Optimization for Feature Selection on High-Dimensional Data

Xian-Fang Song,Yong Zhang,Yi-Nan Guo,Xiao-Yan Sun,Yong-Li Wang
DOI: https://doi.org/10.1109/tevc.2020.2968743
IF: 16.497
2020-10-01
IEEE Transactions on Evolutionary Computation
Abstract:Evolutionary feature selection (FS) methods face the challenge of "curse of dimensionality" when dealing with high-dimensional data. Focusing on this challenge, this article studies a variable-size cooperative coevolutionary particle swarm optimization algorithm (VS-CCPSO) for FS. The proposed algorithm employs the idea of "divide and conquer" in cooperative coevolutionary approach, but several new developed problem-guided operators/strategies make it more suitable for FS problems. First, a space division strategy based on the feature importance is presented, which can classify relevant features into the same subspace with a low computational cost. Following that, an adaptive adjustment mechanism of subswarm size is developed to maintain an appropriate size for each subswarm, with the purpose of saving computational cost on evaluating particles. Moreover, a particle deletion strategy based on fitness-guided binary clustering, and a particle generation strategy based on feature importance and crossover both are designed to ensure the quality of particles in the subswarms. We apply VS-CCPSO to 12 typical datasets and compare it with six state-of-the-art methods. The experimental results show that VS-CCPSO has the capability of obtaining good feature subsets, suggesting its competitiveness for tackling FS problems with high dimensionality.
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the "curse of dimensionality" problem in high-dimensional data feature selection. Specifically, the main challenges faced by evolutionary feature selection methods when dealing with high-dimensional data are the significant increase in computational complexity and the decline in algorithm performance. To solve this problem, the authors propose a method based on the Variable-Scale Cooperative Coevolutionary Particle Swarm Optimization (VS-CCPSO) algorithm for feature selection. #### The main contributions include: 1. **Proposing a Cooperative Coevolutionary Particle Swarm Optimization Algorithm**: By decomposing the high-dimensional feature selection problem into multiple low-dimensional subproblems, the scalability of the Particle Swarm Optimization (PSO) algorithm in handling high-dimensional data is significantly improved. 2. **Proposing a Space Partitioning Strategy Based on Feature Importance**: Unlike existing strategies, this method classifies relevant features into the same subspace by calculating the correlation between features and class labels, without repeatedly calculating the correlation between features, thereby greatly reducing the computational cost required to partition the feature space. 3. **Developing an Adaptive Adjustment Mechanism for Subpopulation Size**: Automatically deletes redundant particles or adds new particles with good diversity according to the evolutionary state to maintain an appropriate size for each subpopulation, balancing convergence and diversity, and improving the efficiency of particle utilization. 4. **Proposing a Fitness-Guided Binary Clustering Particle Deletion Strategy**: Capable of finding particles with low fitness and low diversity at a lower computational cost, demonstrating its effectiveness and efficiency. 5. **Proposing a Particle Generation Strategy Based on Feature Importance and Crossover**: Since features with high importance are more likely to be included in new particles, the new particles have higher classification accuracy. In summary, the method proposed in this paper aims to address the challenges posed by high-dimensional data through effective feature selection strategies, and its effectiveness and competitiveness are validated through experiments.