Enhancing breast cancer screening with urinary biomarkers and Random Forest supervised classification: a comprehensive investigation

Eugenio Alladio,Fulvia Trapani,Lorenzo Castellino,Marta Massano,Daniele Di Corcia,Alberto Salomone,Enrico Berrino,Riccardo Ponzone,Caterina Marchiò,Anna Sapino,Marco Vincenti
DOI: https://doi.org/10.1016/j.jpba.2024.116113
IF: 3.571
2024-03-22
Journal of Pharmaceutical and Biomedical Analysis
Abstract:Objectives Urinary sex hormones are investigated as potential biomarkers for the early detection of breast cancer, aiming to evaluate their relevance and applicability, in combination with supervised machine-learning data analysis, toward the ultimate goal of extensive screening. Methods Sex hormones were determined on urine samples collected from 250 post-menopausal women (65 healthy - 185 with breast cancer, recruited among the clinical patients of Candiolo Cancer Institute FPO-IRCCS (Torino, Italy). Two analytical procedures based on UHPLC-MS/HRMS were developed and comprehensively validated to quantify 20 free and conjugated sex hormones from urine samples. The quantitative data were processed by seven machine learning algorithms. The efficiency of the resulting models was compared. Results Among the tested model aimed to relate urinary estrogen and androgen levels and the occurrence of breast cancer, Random Forest (RF) proved to underscore all the other supervised classification approaches, including Partial Least Squares – Discriminant Analysis (PLS-DA), in terms of effectiveness and robustness. The final optimized model built on only five biomarkers (testosterone-sulphate, alpha-estradiol, 4-methoxyestradiol, DHEA-sulphate, and epitestosterone-sulphate) achieved an approximate 98% diagnostic accuracy on replicated validation sets. To balance the less-represented population of healthy women, a Synthetic Minority Oversampling TEchnique (SMOTE) data oversampling approach was applied. Conclusions By means of tunable hyperparameters optimization, the RF algorithm showed great potential for early breast cancer detection, as it provides clear biomarkers ranking and their relative efficiency, allowing to ground the final diagnostic model on a restricted selection five steroid biomarkers only, as desirable for noninvasive tests with wide screening purposes.
pharmacology & pharmacy,chemistry, analytical
What problem does this paper attempt to address?