Improving Breast Cancer Diagnosis Accuracy by Particle Swarm Optimization Feature Selection

Reihane Kazerani
DOI: https://doi.org/10.1007/s44196-024-00428-5
IF: 2.259
2024-03-14
International Journal of Computational Intelligence Systems
Abstract:Breast cancer has been one of the leading causes of death among women in the world. Early detection of this disease can save patient's lives and reduce mortality. Due to the large number of features involved in the diagnosis of this disease, the breast cancer diagnosis process can be time consuming. To reduce cost and time and improving accuracy of breast cancer diagnosis, this paper propose a feature selection algorithm based on particle swarm optimization (PSO) combined with machine learning methods for selection the most effective features for breast cancer diagnosis among all features. In order to evaluate the efficiency of the proposed feature selection method, it was tested on three most common breast cancer datasets available in the University of California, Irvine (UCI) repository named: Coimbra dataset (CD), Wisconsin Diagnostic Breast Cancer dataset (WDBC) and Wisconsin Prognostic Breast Cancer dataset (WPBC). In the Coimbra dataset with all its 9 features and without PSO feature selection algorithm the highest obtained accuracy was 87% by Support Vector Machine method, while with PSO feature selection algorithm the accuracy reached to 91% and the number of features was reduced from 9 to 4. In the WDBC dataset with all its 30 features and without PSO feature selection algorithm the highest obtained accuracy was 99% by Random Forest method, while with PSO feature selection algorithm the accuracy reached to 100% and the number of features was reduced from 30 to 19. In the WPBC dataset with all its 33 features and without PSO feature selection algorithm the highest obtained accuracy was 94% by Support Vector Machine method, while with PSO feature selection algorithm the accuracy reached to 96% and the number of features was reduced from 33 to 17. The results of this paper indicated that the proposed feature selection algorithm based on PSO algorithm can improve the accuracy of breast cancer diagnosis. While it has selected fewer and more effective features than the total number of features in the original datasets.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?
### What problem does the paper attempt to solve? The paper aims to improve the accuracy of breast cancer diagnosis through a Particle Swarm Optimization (PSO) feature selection algorithm. Specifically, the goal of the paper is to enhance the accuracy of breast cancer diagnosis by reducing the number of features in the dataset, thereby lowering diagnostic costs and time. **Main contributions are as follows:** 1. **Proposed Feature Selection Algorithm**: The paper proposes a feature selection algorithm based on Particle Swarm Optimization (PSO), combined with machine learning methods to select the most effective features for breast cancer diagnosis. 2. **Experimental Validation**: To evaluate the effectiveness of the proposed feature selection method, the study was tested on three commonly used breast cancer datasets: - Coimbra dataset - Wisconsin Diagnostic Breast Cancer dataset (WDBC) - Wisconsin Prognostic Breast Cancer dataset (WPBC) **Results Comparison:** - In the Coimbra dataset, the original dataset contains 9 features, and the highest accuracy using the Support Vector Machine (SVM) method is 87%. After applying the PSO feature selection algorithm, the accuracy increased to 91%, and the number of features was reduced from 9 to 4. - In the WDBC dataset, the original dataset contains 30 features, and the highest accuracy using the Random Forest (RF) method is 99%. After applying the PSO feature selection algorithm, the accuracy increased to 100%, and the number of features was reduced from 30 to 19. - In the WPBC dataset, the original dataset contains 33 features, and the highest accuracy using the Support Vector Machine (SVM) method is 94%. After applying the PSO feature selection algorithm, the accuracy increased to 96%, and the number of features was reduced from 33 to 17. **Conclusion**: The study demonstrates that the feature selection method based on the PSO algorithm can improve the accuracy of breast cancer diagnosis while reducing the number of features, thereby lowering computational costs and increasing efficiency.