An Example of Performance Comparison of Supervised Machine Learning Algorithms Before and After PCA and LDA Application: Breast Cancer Detection

Mete Yaganoglu,Seda Kaya
DOI: https://doi.org/10.1109/ASYU50717.2020.9259883
2020-10-15
Abstract:Rapid and accurate diagnosis of common diseases such as breast cancer that is common among women is of great importance. While this determination is made by specialist doctors, studies are carried out with machine learning algorithms to help them. Machine learning algorithms make inferences from existing data and predict what is unknown. Supervised machine learning algorithms used in classification of categorical data and of new data are frequently used in today’s problems. In this study, dataset with 357 malign and 212 benign created using the features of the University of Wisconsin by making some measurements in the mass images on the breast and the feature methods extracted from these features were used. After applying the necessary pre-processing and normalization steps for the dataset, the data was separated as training and test data and training was performed for six supervised machine learning algorithms (k-nearest neighbors algorithm, random forest algorithm, decision trees, naive bayes algorithm, support vector machines and logistic regression). In addition, the same operations were carried out by applying principal components analysis and linear discriminant analysis to the dataset. In this study, the accuracy values for all our models were increased after applying linear discriminant analysis and a success rate of 96.49% was achieved with Logistic Regression. It was aimed to select the appropriate algorithm and to obtain observation that will be the source of the next studies with the result to be obtained in this study.
Medicine,Computer Science
What problem does this paper attempt to address?