Analysis and comparison of feature selection methods towards performance and stability
Matheus Cezimbra Barbieri,Bruno Iochins Grisci,Márcio Dorn
DOI: https://doi.org/10.1016/j.eswa.2024.123667
IF: 8.5
2024-03-22
Expert Systems with Applications
Abstract:The amount of gathered data is increasing at unprecedented rates for machine learning applications such as natural language processing, computer vision, and bioinformatics. This increase implies a higher number of samples and features; thus, some problems regarding highly dimensional data arise. The curse of dimensionality, small samples, noisy or redundant features, and biased data are among them. Feature selection is fundamental to dealing with such problems. It reduces the data dimensionality by selecting the most relevant and less redundant features. Thus reducing the computational cost, improving accuracy, and enhancing the data's interpretability to machine learning models and domain experts. However, there are several selector options from which to choose. This work compares some of the most representative algorithms from different feature selection groups regarding a broad range of measures, several datasets, and different strategies from diverse perspectives. We employ metrics to appraise selection accuracy, selection redundancy, prediction performance, algorithmic stability, selection reliability, and computational time of several feature selection algorithms. We developed and shared a new open Python framework to benchmark the algorithms. The results highlight the strengths and weaknesses of these algorithms and can guide their application.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science