Particle swarm optimization based on filter-based population initialization method for feature selection in classification

Yu Xue,Xu Cai,Weiwei Jia
DOI: https://doi.org/10.1007/s12652-022-04444-1
IF: 3.662
2022-10-28
Journal of Ambient Intelligence and Humanized Computing
Abstract:In classification problems, the datasets often have many unrelated or redundant features. The unrelated or redundant features may deteriorate the performance of classifier. Feature selection (FS) is an effective approach to solve this kind of problem. In the past research, Particle Swarm Optimization (PSO) has already been used for the FS problems, and some researchers have designed various methods to improve the PSO to efficiently solve the problems. For one thing, it has been verified that a good initial population can significantly enhance the performance of PSO and increase the convergence speed of PSO. For another, lots of traditional filter methods which can provide heuristic information about features have been proposed. However, few studies increase the performance of PSO for FS by improving its initialization method based on the filter methods. Therefore, in this paper, Relief, Information gain and Fisher score are introduced to initialize the population of PSO on FS problems. In the proposed method, for each filter method, it is first employed to evaluate and rank the features. Then, we assign a probability value to each feature according to its rank. After that, the initial population is generated based on the probabilities of each feature. Therefore, three initialization populations are separately obtained by the three filter methods. Finally, they are merged into the final initialization population. The comparative experiments are conducted on nine datasets and the PSO with proposed initialization method is compared with the PSO with other two initialization methods. The results indicate that the proposed initialization method can greatly enhance the search ability and increase the convergence speed of PSO for solving FS problems, especially large scale FS problems.
computer science, information systems,telecommunications, artificial intelligence
What problem does this paper attempt to address?