Leveraging cfDNA fragmentomic features in a stacked ensemble model for early detection of esophageal squamous cell carcinoma.

Zichen Jiao,Xiaoqiang Zhang,Yulong Xuan,Xiaoming Shi,Zirui Zhang,Ao Yu,Ningyou Li,Shanshan Yang,Xiaofeng He,Gefei Zhao,Ruowei Yang,Jianqun Chen,Xuxiaochen Wu,Xiaoxi Chen,Hua Bao,Fufeng Wang,Wei Ren,Hongwei Liang,Qihan Chen,Tao Wang
DOI: https://doi.org/10.1200/jco.2024.42.16_suppl.4054
IF: 45.3
2024-06-01
Journal of Clinical Oncology
Abstract:4054 Background: In this study, we developed a stacked ensemble model that leverages cell-free DNA (cfDNA) fragmentation for the early detection of esophageal squamous cell carcinoma (ESCC). The model combined four fragmentomics features obtained from whole genome sequencing (WGS) and employed four machine learning algorithms. We evaluated the model’s generalizability in an independent validation cohort and an external cohort collected at different center. Additionally, the model’s robustness and repeatability were assessed across low coverage and repeated measured samples. The results underscore the promising potential of our model as an effective strategy for early diagnosis and management of ESCC in clinical settings. Methods: 256 healthy individuals and 243 patients diagnosed with esophageal cancer were enrolled in this study, including 47 healthy participants and 44 patients in the external cohort. Plasma samples from all participants were profiled by whole-genome sequencing (WGS). Fragmentomic features encompassed copy number variation (CNV), fragmentation size coverage (FSC), fragmentation size distribution (FSD), and nucleosome positioning (NP) were incorporated alongside machine learning models to develop an optimized classification model. The model generated a cancer score ranging from 0 to 1 for each cancer and noncancer sample, with a score closer to 1 indicating a higher probability of cancer. The performance of this model was assessed using an independent validation cohort and an external cohort, and model’s robustness and reproducibility were examined. Results: A stacked ensemble model was developed by integrating four cfDNA features and four machine learning algorithms. This integrated model exhibiting remarkable sensitivity of 91.8% (89/97) and specificity of 98.1% (102/104), and AUC of 0.986 in the independent validation cohorts using the cut-off of 0.69 selected at a specificity level of 98% (105) in the training cohort. The external cohort demonstrated a sensitivity of 86.4% (38/44) and a specificity of 95.7% (45/47) with the same cut-off. The model's performance remained consistent even at low sequencing coverage depths of 0.5× (AUC 0.978), providing valuable insights into the resilience and stability of our methodology and ascertained its practical applicability in scenarios with limited sequencing resources or constraints. Furthermore, our model demonstrated sensitivity in identifying early pathological features, with a sensitivity of 94.1% (16/17) for stage I and 91.4% (53/58) for tumors smaller than 3 cm. Conclusions: A stacked ensemble model was implemented in this study, utilizing cfDNA fragmentomics features with exceptional sensitivity for early detection of esophageal squamous cell carcinoma. The anticipated impact of this model is the enhancement of early detection strategies for esophageal cancer in clinical setting.
oncology
What problem does this paper attempt to address?