An improved Differential evolution with Sailfish optimizer (DESFO) for handling feature selection problem

Safaa. M. Azzam,O. E. Emam,Ahmed Sabry Abolaban
DOI: https://doi.org/10.1038/s41598-024-63328-w
IF: 4.6
2024-06-13
Scientific Reports
Abstract:As a preprocessing for machine learning and data mining, Feature Selection plays an important role. Feature selection aims to streamline high-dimensional data by eliminating irrelevant and redundant features, which reduces the potential curse of dimensionality of a given large dataset. When working with datasets containing many features, algorithms that aim to identify the most valuable features to improve dataset accuracy may encounter difficulties because of local optima. Many studies have been conducted to solve this problem. One of the solutions is to use meta-heuristic techniques. This paper presents a combination of the Differential evolution and the sailfish optimizer algorithms (DESFO) to tackle the feature selection problem. To assess the effectiveness of the proposed algorithm, a comparison between Differential Evolution, sailfish optimizer, and nine other modern algorithms, including different optimization algorithms, is presented. The evaluation used Random forest and key nearest neighbors as quality measures. The experimental results show that the proposed algorithm is a superior algorithm compared to others. It significantly impacts high classification accuracy, achieving 85.7% with the Random Forest classifier and 100% with the Key Nearest Neighbors classifier across 14 multi-scale benchmarks. According to fitness values, it gained 71% with the Random forest and 85.7% with the Key Nearest Neighbors classifiers.
multidisciplinary sciences
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the **Feature Selection (FS) problem**, especially the challenges encountered when dealing with high - dimensional data. Feature selection is a crucial pre - processing step in machine learning and data mining. Its purpose is to simplify high - dimensional data by eliminating irrelevant and redundant features, thereby reducing the potential impact of the "curse of dimensionality" and improving the accuracy of classification models. #### Specific problem descriptions: 1. **Complexity of high - dimensional data**: When a data set contains a large number of features, traditional feature selection algorithms may have difficulties in identifying the most valuable features and are prone to fall into local optimal solutions. 2. **Impact of noise and redundant features**: There may be noise or redundant features in the data set, which will interfere with the learning mechanism of the model, leading to over - fitting or reducing the classification accuracy. 3. **NP - hardness of the optimization problem**: Feature selection is widely regarded as a combinatorial optimization problem and is likely to be an NP - complete problem. As the complexity of the problem increases, the computing time grows exponentially, so an efficient optimization algorithm is required to solve this problem. #### Solutions: To solve the above problems, this paper proposes a new hybrid algorithm - **the algorithm combining the improved Differential Evolution and Sailfish Optimizer (DESFO)**. Specific contributions include: 1. **Algorithm innovation**: By integrating and improving Differential Evolution (DE) and Sailfish Optimizer (SFO), a new DESFO algorithm is created. 2. **Binary conversion**: Use the V - shaped function as a Transfer Function (TF) to convert position values into binary format. 3. **Improvement of search strategy**: Introduce the Periodic Mode Boundary Handling (PMBH) method and a new Local Search (LS) strategy to enhance the exploration and exploitation processes. 4. **Application area**: Use the DESFO algorithm for feature selection in supervised classification tasks. 5. **Performance evaluation**: Evaluate the performance of DESFO through indicators such as the average fitness rate, average accuracy rate and average number of selected features, and use the Wilcoxon non - parametric rank - sum test to compare with existing algorithms. #### Relevant backgrounds: - **Differential Evolution (DE)**: A population - based stochastic optimization algorithm with the characteristics of fast convergence and easy implementation. - **Sailfish Optimizer (SFO)**: A swarm - intelligence - based optimization algorithm that simulates the behavior of sailfish hunting sardines and can achieve a good balance between global and local searches. Through the DESFO algorithm, researchers hope to achieve better search precision, faster convergence speed and higher stability in the feature selection problem, while avoiding the problem of falling into local optimal solutions. Experimental results show that DESFO outperforms other modern optimization algorithms on multiple benchmark data sets.