Classification of Potential Breast/Colorectal Cancer Cases Using Machine Learning Methods

Maryam Jafarpour,Ali Moeini,Niloofar Maryami,Azin Nahvijou,Ayoub Mohammadian
DOI: https://doi.org/10.5812/ijcm-135724
2023-05-17
International Journal of Cancer Management
Abstract:Background: The algorithmic classification of infected and healthy individuals by gene expression has been a topic of interest to researchers in numerous domains, including cancer. Several studies have presented numerous solutions, such as neural networks and support vector machines (SVMs), to classify a diverse range of cancer cases. Such classifications have provided some degrees of accuracy, which highly depend on optimization approaches and suitable kernels. Objectives: This study aimed at proposing a method to classify cancer-prone and healthy cases under breast cancer and colorectal cancer (CRC), using machine learning methods efficiently, increasing the accuracy of the classification process. Methods: This study presented an algorithm to diagnose individuals prone to breast cancer and CRC. The novelty of this algorithm lies in its suitable kernel and the feature extraction approach. By the application of this algorithm, this study first identified the genes closely associated with these types of cancers and, then, tried to find individuals susceptible to the concerned cancers using SVM. The present study highlighted the indirect gene expressions associated with these cancers, which might show health status complications for the patients. To this end, the algorithm consists of SVMs in conjunction with the k-fold method for validation. Results: The results confirmed the superior performance of this approach, compared to the common neural networks. The algorithm’s identification accuracy values were 98.077% and 99.806% for breast cancer and CRC, respectively. The graphic representation of the cause-effect relationships was also provided to help researchers better understand the trend of cancer or other types of diseases. Conclusions: The feature extraction method highly affects the accuracy of the classification. In addition, relying on indirect disease-triggering genes’ expressions highlights a cause-effect relationship between genes and diseases. Such relationships can form Markov models in the clinical domain leading to treatment paths and prediction of patient outcomes.
What problem does this paper attempt to address?