Explainable Machine Learning Models Using Robust Cancer Biomarkers Identification from Paired Differential Gene Expression

Elisa Díaz de la Guardia-Bolívar,Juan Emilio Martínez Manjón,David Pérez-Filgueiras,Igor Zwir,Coral del Val
DOI: https://doi.org/10.3390/ijms252212419
IF: 5.6
2024-11-24
International Journal of Molecular Sciences
Abstract:In oncology, there is a critical need for robust biomarkers that can be easily translated into the clinic. We introduce a novel approach using paired differential gene expression analysis for biological feature selection in machine learning models, enhancing robustness and interpretability while accounting for patient variability. This method compares primary tumor tissue with the same patient's healthy tissue, improving gene selection by eliminating individual-specific artifacts. A focus on carcinoma was selected due to its prevalence and the availability of the data; we aim to identify biomarkers involved in general carcinoma progression, including less-researched types. Our findings identified 27 pivotal genes that can distinguish between healthy and carcinoma tissue, even in unseen carcinoma types. Additionally, the panel could precisely identify the tissue-of-origin in the eight carcinoma types used in the discovery phase. Notably, in a proof of concept, the model accurately identified the primary tissue origin in metastatic samples despite limited sample availability. Functional annotation reveals these genes' involvement in cancer hallmarks, detecting subtle variations across carcinoma types. We propose paired differential gene expression analysis as a reference method for the discovering of robust biomarkers.
biochemistry & molecular biology,chemistry, multidisciplinary
What problem does this paper attempt to address?