EsoDetect: Computational Validation and Algorithm Development of a Novel Diagnostic and Prognostic Tool for Dysplasia in Barrett's Esophagus

Migla Miskinyte,Benilde Pondeca,Jose B Pereira-Leal,Joana Cardoso
DOI: https://doi.org/10.1101/2024.11.26.24317976
2024-12-04
Abstract:Barrett's esophagus (BE) is the only known precursor to esophageal adenocarcinoma (EAC), a malignancy with increasing incidence and unfavorable prognosis. This study endeavors to identify BE biomarkers capable of diagnosing low-grade dysplasia (LGD) in BE, as well as biomarkers that can predict the progression from BE to EAC to be subsequently integrated into diagnostic and prognostic algorithms. Datasets containing gene expression data from metaplastic and dysplastic BE, as well as EAC tissue samples, were collected from public databases and used to explore gene expression patterns that differentiate between non-dysplastic (ND) and LGD BE (for diagnostic purposes) and between non-progressed and progressed BE (for prognostic purposes). Specifically, for the diagnostic application, three RNAseq datasets were employed, while for the prognostic application, nine microarray datasets were identified, and 25 previously described genes were validated. A Thresholding Function was applied to each gene to determine the optimal gene expression threshold for group differentiation. All analyzed genes were ranked based on the F1-score metrics. Following the identification of genes with superior performance, different classifiers were trained. Subsequently, the best algorithms for diagnostic and prognostic applications were selected. In evaluating the value of gene expression for diagnosis and prognosis, the analyzed datasets allowed for the ranking of biomarkers, resulting in eighteen diagnostic genes and fifteen prognostic genes that were used for further algorithm development. Ultimately, a linear support vector machine algorithm incorporating ten genes was identified for diagnostic application, while a radial basis function support vector machine algorithm, also utilizing ten genes, was selected for prognostic prediction. Notably, both classifiers achieved recall and specificity scores exceeding 0.90. The identified algorithms, along with their associated biomarkers, hold significant potential to aid in the early management of malignant progression of BE. Their strengths lie in their development using multiple independent datasets and their ability to demonstrate recall and specificity levels superior to those reported in the existing literature. Ongoing experimental and clinical validation is essential to further substantiate their utility and effectiveness, and to ensure that these tools can be reliably integrated into clinical practice to improve patient outcomes.
Oncology
What problem does this paper attempt to address?