Abstract:Feature selection plays a pivotal role in the data preprocessing and model-building pipeline, significantly enhancing model performance, interpretability, and resource efficiency across diverse domains. In population-based optimization methods, the generation of diverse individuals holds utmost importance for adequately exploring the problem landscape, particularly in highly multi-modal multi-objective optimization problems. Our study reveals that, in line with findings from several prior research papers, commonly employed crossover and mutation operations lack the capability to generate high-quality diverse individuals and tend to become confined to limited areas around various local optima. This paper introduces an augmentation to the diversity of the population in the well-established multi-objective scheme of the genetic algorithm, NSGA-II. This enhancement is achieved through two key components: the genuine initialization method and the substitution of the worst individuals with new randomly generated individuals as a re-initialization approach in each generation. The proposed multi-objective feature selection method undergoes testing on twelve real-world classification problems, with the number of features ranging from 2,400 to nearly 50,000. The results demonstrate that replacing the last front of the population with an equivalent number of new random individuals generated using the genuine initialization method and featuring a limited number of features substantially improves the population's quality and, consequently, enhances the performance of the multi-objective algorithm.

What problem does this paper attempt to address?

### The problems the paper attempts to solve The paper aims to solve the problem of insufficient diversity in multi - objective feature selection. In population - based optimization methods, generating diverse individuals is particularly important for fully exploring the problem space, especially in highly multi - modal multi - objective optimization problems. However, the commonly used crossover and mutation operations lack the ability to generate high - quality diverse individuals and are prone to getting trapped in the limited area around the local optimal solution. Therefore, the paper proposes a method to enhance the population diversity in the NSGA - II algorithm, which is achieved through two key components: a true initialization method and a method of replacing the worst individuals with newly randomly generated individuals in each generation as a re - initialization method. This method was tested on 12 real - world classification problems with the number of features ranging from 2,400 to nearly 50,000. The results show that replacing the worst individuals with new random individuals generated by the true initialization method significantly improves the quality of the population, thereby enhancing the performance of the multi - objective algorithm. ### Specific problem description 1. **Importance of feature selection**: - Feature selection plays a crucial role in the data pre - processing and model building pipeline and can significantly improve model performance, interpretability, and resource efficiency. - Removing irrelevant and redundant information not only reduces the computational requirements but also improves the performance of the classifier by alleviating the curse of dimensionality and simplifies model interpretation. 2. **Limitations of existing methods**: - The commonly used crossover and mutation operations lack the ability to generate high - quality diverse individuals and are prone to getting trapped in the local optimal solution. - This limitation is particularly evident in highly multi - modal multi - objective optimization problems because it is necessary to fully explore the problem space to find the global optimal solution. 3. **The paper's solution**: - Proposes a method to enhance the population diversity in the NSGA - II algorithm, which is achieved through the following two key components: - **True initialization method**: Ensure that the initial population has a high degree of diversity. - **Replacing the worst individuals**: Replace the worst individuals with newly randomly generated individuals in each generation to further enhance the population diversity. - Through these methods, the algorithm can explore in a wider search space, avoid premature convergence, and thus improve the performance of multi - objective optimization. ### Experimental results - **Experimental setup**: - Use 12 real - world classification problems for testing, with the number of features ranging from 2,400 to nearly 50,000. - Each algorithm is run 31 times, and each time 20% of the data is randomly selected as the test set, using repeated random sampling validation or Monte Carlo cross - validation. - Fix the number of function calls to 15,000 to ensure a fair comparison. - **Performance evaluation**: - Use hypervolume (HV) as a multi - objective evaluation metric, with the reference point set to (1, 1). - The results show that the proposed method has a significantly better HV value than the conventional NSGA - II on all data sets, with an average HV value reaching 0.97. - The method of replacing the worst individuals significantly increases the population diversity and improves the exploration ability of the algorithm. ### Conclusion The paper effectively solves the problem of insufficient diversity in multi - objective feature selection by introducing the true initialization method and the strategy of replacing the worst individuals, and significantly improves the performance of the NSGA - II algorithm. The experimental results on multiple real - world data sets show that this method can better explore the search space, avoid premature convergence, and thus obtain better multi - objective optimization solutions.

Enhancing Diversity in Multi-objective Feature Selection

A Multi-objective Feature Selection Method Considering the Interaction Between Features

MPF-FS: A multi-population framework based on multi-objective optimization algorithms for feature selection

Compact NSGA-II for Multi-objective Feature Selection

Fair Feature Subset Selection using Multiobjective Genetic Algorithm

Feature Selection Using Diversity-Based Multi-objective Binary Differential Evolution

Multi-objective Binary Coordinate Search for Feature Selection

Maintaining Diversity Provably Helps in Evolutionary Multimodal Optimization

Differential Evolution Based Feature Selection: A Niching-based Multi-objective Approach

An evolutionary multiobjective method based on dominance and decomposition for feature selection in classification

Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach

A metaheuristic multi-objective interaction-aware feature selection method

Feature Selection Based on Hybridization of Genetic Algorithm and Competitive Swarm Optimizer

Evolutionary Multi-Objective Diversity Optimization

A problem-specific non-dominated sorting genetic algorithm for supervised feature selection

Adaptive crossover operator based multi-objective binary genetic algorithm for feature selection in classification

An enhanced fast non-dominated solution sorting genetic algorithm for multi-objective problems

A many-objective evolutionary algorithm under diversity-first selection based framework

A multi-objective optimization algorithm for feature selection problems

Niching Diversity Estimation for Multi-modal Multi-objective Optimization

A binary individual search strategy-based bi-objective evolutionary algorithm for high-dimensional feature selection