Bimodal Distribution Removal and Genetic Algorithm in Neural Network for Breast Cancer Diagnosis

Ke Quan
DOI: https://doi.org/10.48550/arXiv.2002.08729
2020-02-20
Abstract:Diagnosis of breast cancer has been well studied in the past. Multiple linear programming models have been devised to approximate the relationship between cell features and tumour malignancy. However, these models are less capable in handling non-linear correlations. Neural networks instead are powerful in processing complex non-linear correlations. It is thus certainly beneficial to approach this cancer diagnosis problem with a model based on neural network. Particularly, introducing bias to neural network training process is deemed as an important means to increase training efficiency. Out of a number of popular proposed methods for introducing artificial bias, Bimodal Distribution Removal (BDR) presents ideal efficiency improvement results and fair simplicity in implementation. However, this paper examines the effectiveness of BDR against the target cancer diagnosis classification problem and shows that BDR process in fact negatively impacts classification performance. In addition, this paper also explores genetic algorithm as an efficient tool for feature selection and produced significantly better results comparing to baseline model that without any feature selection in place
Machine Learning,Computer Vision and Pattern Recognition,Neural and Evolutionary Computing,Image and Video Processing
What problem does this paper attempt to address?
The main problems that this paper attempts to solve include: 1. **Handling Non - linear Relationships in Breast Cancer Diagnosis**: - Traditional linear programming models perform poorly when dealing with the complex non - linear relationships between cell features and tumor malignancy. Neural Networks (NN), due to their strong non - linear processing capabilities, are considered more suitable for the classification diagnosis of breast cancer. - Formula representation: Suppose \( f(x) \) is the relationship function between cell feature \( x \) and tumor malignancy. Then the linear model assumes \( f(x)=w^{T}x + b \), while the neural network can capture more complex non - linear relationships \( f(x;\theta) \), where \( \theta \) represents network parameters. 2. **Feature Selection and Noise Data Processing**: - Two techniques are introduced in the study to improve the performance of neural networks: Bimodal Distribution Removal (BDR) and Genetic Algorithm (GA). BDR aims to remove abnormal patterns during the training process, and GA is used to select the most representative features to improve the generalization ability and prediction accuracy of the model. 3. **Evaluating the Effectiveness of BDR and GA**: - The paper verifies the influence of BDR and GA on the performance of neural networks through experiments. Specifically, the researchers hope to confirm: - Whether BDR can effectively remove noise data, thereby improving the generalization ability of the model. - Whether GA can effectively screen out features that contribute to tumor malignancy, thereby improving the accuracy and training efficiency of the model. ### Summary of Main Research Contents - **Background and Motivation**: - Traditional linear programming models have limitations in dealing with breast cancer diagnosis problems, especially in handling complex non - linear relationships. Therefore, using neural networks for classification is a better choice. - **Methods**: - Use the Wisconsin Breast Cancer Diagnosis dataset for experiments. - Design a three - layer neural network and conduct comparative experiments with three models: the control group, experimental group B (applying BDR), and experimental group G (applying GA). - Compare the classification performance of different models and evaluate the effects of BDR and GA. - **Results and Discussion**: - **Effectiveness of GA**: - The experimental results show that GA can significantly improve classification accuracy, especially when the number of remaining features is between 16 and 20. In addition, GA also reduces the number of hidden - layer neurons required, from about 40 to about 28. - **Effectiveness of BDR**: - The experimental results indicate that BDR does not significantly improve classification accuracy, and in some cases, it even reduces performance. This may be because the removed patterns are not real noise data but meaningful input information. ### Conclusion This paper verifies the effectiveness of GA in feature selection through experiments, but BDR does not achieve the expected effect in removing noise data and may even mistakenly remove some useful information. Therefore, more caution is required in the application of BDR, especially when determining whether the bimodal distribution is truly caused by noise.