Machine learning based identification of candidate miRNA biomarkers for micro-invasive breast cancer diagnosis

Jayanta K Pal,Bhadresh R Rami
DOI: https://doi.org/10.1101/2024.08.16.608025
2024-08-19
Abstract:Purpose: Early detection of cancer can be done by analyzing miRNA expression patterns. miRNAs play a significant role in biological processes, and they have been identified as one of the major biomarkers in cancer. miRNAs can also be detected in human blood (micro-invasive way of sample collection), which makes the diagnostic procedure much less stressful for the patients. In this article, we emphasize on identification of miRNAs as biomarkers (collected from blood sample) that are associated with breast cancer. Methods: In this investigation we use three breast cancer data sets, obtained from blood samples. A combination of multiple feature selection and classification models is used to classify normal vs cancer samples. In the first stage, the significant miRNAs associated with cancer were selected by (a) classifier assigned weights and (b) feature selection algorithms. In the second stage, we apply multiple classifiers to observe the diagnostic capability of the selected miRNAs for consideration as potential biomarkers. Results: Our miRNA selection stage identified ten miRNAs, which were subsequently analysed using multiple classifiers for their ability to distinguish between normal and cancerous cases. The performance is examined using a 5-fold cross validation technique using multiple measures such as precision, recall, F1-score, and accuracy. We also use a confusion matrix to evaluate the performance of the selected miRNAs. For two out of three datasets, we achieve satisfactory performance in terms of normal vs cancer classification. Conclusion: We observe that high expression levels of miRNA is relatively more important than the sample size, for effective blood-based diagnosis of breast cancer. The novelty of our investigation lies in combining three aspects viz., blood-based breast cancer diagnosis, use of multiple ML based feature selection algorithms to identify the miRNAs associated with breast cancer, evaluating them using various classifiers and the robustness of these ML models in feature selection and classification.
Cancer Biology
What problem does this paper attempt to address?