Noninvasive Cancer Detection by Extracting and Integrating Multi-Modal Data from Whole-Methylome Sequencing of Plasma Cell-Free DNA

Fenglong Bie,Zhijie Wang,Yulong Li,Yuanyuan Hong,Tiancheng Han,Fang Lv,Shunli Yang,Suxing Li,Xi Li,Peiyao Nie,Ruochuan Zang,Moyan Zhang,Peng Song,Fenling Feng,Wei Guo,Jianchun Duan,Guangyu Bai,Yuan Li,Qilin Huai,Bolun Zhou,Yu S. Huang,Weizhi Chen,Fengwei Tan,Shugeng Gao
DOI: https://doi.org/10.1101/2022.07.04.498641
2022-01-01
Abstract:Plasma cell-free DNA (cfDNA) methylation and fragmentation signatures have been shown to be valid biomarkers for blood-based cancer detection. However, conventional methylation sequencing assays are inapplicable for fragmentomic profiling due to bisulfite-induced DNA damage. Here using enzymatic conversion-based low-pass whole-methylome sequencing (WMS), we developed a novel approach to comprehensively interrogate the genome-wide plasma methylation, fragmentation, and copy number profiles for sensitive and noninvasive multi-cancer detection. With plasma WMS data from a clinical cohort comprising 497 healthy controls and 780 patients with both early- and advanced-stage cancers of the breast, colorectum, esophagus, stomach, liver, lung, or pancreas, genomic features including methylation, fragmentation size, copy number alteration, and fragment end motif were extracted individually and subsequently integrated to develop an ensemble cancer classifier, called THEMIS, using machine learning algorithms. THEMIS outperformed individual biomarkers for differentiating cancer patients of all seven types from healthy individuals and achieved a combined area under the curve value of 0.971 in the independent test cohort, translating to a sensitivity of 86% and early-stage (I and II) sensitivity of 77% at 99% specificity. In addition, we built a cancer signal origin classifier with true-positive cancer samples at 100% specificity based on methylation and fragmentation profiling of tissue-specific accessible regulatory elements, which localized cancer-like signal to a limited number of clinically informative sites with 66% accuracy. Overall, this proof-of-concept work demonstrates the feasibility of extracting and integrating multi-modal biomarkers from a single WMS run for noninvasive detection and localization of common cancers across stages.
What problem does this paper attempt to address?