Advanced deep-learning algorithm for multi-cancer detection using cf-WGS.
Tae-Rim Lee,Eunhae Cho,Junnam Lee,Jin Mo Ahn,Dasom Kim,Chang Seok Ki,Minsoo Kim,Jae Sung Lee,Ji Hye Sohn,Sook Ryun Park,Ki Byung Song,Eunsung Jun,Dongryul Oh,Jeong-Won Lee,Young Sik Park,Gi-Won Song,Jeong-Sik Byeon,Bo Hyun Kim
DOI: https://doi.org/10.1200/jco.2023.41.16_suppl.e13560
IF: 45.3
2023-06-01
Journal of Clinical Oncology
Abstract:e13560 Background: Applying machine learning in circulating cell-free DNA (cfDNA) whole genome sequencing (WGS) holds the potential to be a valuable tool for early cancer detection. However, unless the batch effect is taken into account, detecting very tiny cancer signals involves an overfitting issue with relatively small training sample number for artificial intelligence training. This study aimed to develop advanced deep-learning model for cancer detection from cfDNA whole genome sequencing data with minimal batch effect. Methods: We generated low depth cfDNA whole genome sequencing data from 412 cancer patients and 1,269 healthy individuals. The size and end motif frequency of each fragment was measured and represented as a two-dimensional matrix. Each sample data was normalized with reference samples in the same batch prior to model training to remove batch effects between sequencing runs and was divided into 4 datasets stratified by sequencing batch for cross-validation of the models. The final model combines a grouped convolution neural network (CNN) and a long short-term memory (LSTM) model for sequential size information training. Results: The model showed an accuracy of 93.1% and an AUC of 0.97. With a 95% specificity threshold, the model showed an overall sensitivity of 87% and a precision of 84.7%. For individual cancer types, liver, ovarian, esophageal, pancreatic, and lung showed sensitivities of 94.4%, 81.1%, 85.6%, 100%, and 78.2%, respectively. For cancer stages, the sensitivity was 78.6% in stage I, 86.5% in stage II, 91.5% in stage III, and 92.7% in stage IV at 95% specificity threshold. Conclusions: The deep learning models trained using fragment size and end motif frequency of cfDNA demonstrated promising results for cancer detection. The improved performance of this model highlights the potential for improving cancer detection by incorporating advanced deep learning algorithm with well curated training data for batch effect correction.
oncology