Machine learning-based genome-wide interrogation of somatic copy number aberrations in circulating tumor DNA for early detection of hepatocellular carcinoma.

Kaishan Tao,Zhenyuan Bian,Qiong Zhang,Xu Guo,Chun Yin,Yang Wang,Kaixiang Zhou,Shaogui Wan,Meifang Shi,Dengke Bao,Chuhu Yang,Jinliang Xing
DOI: https://doi.org/10.1016/j.ebiom.2020.102811
IF: 11.205
2020-01-01
EBioMedicine
Abstract:BACKGROUND:DNAs released from tumor cells into blood (circulating tumor DNAs, ctDNAs) carry tumor-specific genomic aberrations, providing a non-invasive means for cancer detection. In this study, we aimed to leverage somatic copy number aberration (SCNA) in ctDNA to develop assays to detect early-stage HCCs. METHODS:We conducted low-depth whole-genome sequencing (WGS) to profile SCNAs in 384 plasma samples of hepatitis B virus (HBV)-related HCC and cancer-free HBV patients, using one discovery and two validation cohorts. To fully capture the robust signals of WGS data from the complete genome, we developed a machine learning-based statistical model that is focused on detection accuracy in early-stage HCC. FINDINGS:We built the model using a discovery cohort of 209 patients, achieving an overall area under curve (AUC) of 0.893, with 0.874 for early-stage (Barcelona clinical liver cancer [BCLC] stage 0-A) and 0.933 for advanced-stage (BCLC stage B-D). The performance of the model was then assessed in two validation cohorts (76 and 99 patients) that only consisted of patients with stage 0-A HCC. Our model exhibited a robust predictive performance, with an AUC of 0.920 and 0.812 for the two validation cohorts. Further analyses showed the impact of tumor sample heterogeneity in model training on detecting early-stage tumors, and a refined model addressing the heterogeneity in the discovery cohort significantly increased model performance in validation. INTERPRETATION:We developed an SCNA-based, machine learning-driven model in the non-invasive detection of early-stage HCC in HBV patients and demonstrated its performance through strict independent validations.
What problem does this paper attempt to address?