Integrated 5-hydroxymethylcytosine and fragmentation signatures as enhanced biomarkers in lung cancer

Xinlei Hu,Kai Luo,Hui Shi,Xiaoqin Yan,Ruichen Huang,Bi Zhao,Jun Zhang,Dan Xie,Wei Zhang
DOI: https://doi.org/10.1186/s13148-022-01233-7
2022-01-24
Abstract:Background: Lung cancer is one of most common cancers worldwide, with a 5-year survival rate of less than 20%, which is mainly due to late-stage diagnosis. Noninvasive methods using 5-hydroxymethylation of cytosine (5hmC) modifications and fragmentation profiles from 5hmC cell-free DNA (cfDNA) sequencing provide an opportunity for lung cancer detection and management. Results: A total of 157 lung cancer patients were recruited to generate the largest lung cancer cfDNA 5hmC dataset, which mainly consisted of 62 lung adenocarcinoma (LUAD), 48 lung squamous cell carcinoma (LUSC) and 25 small cell lung cancer (SCLC) patients, with most patients (131, 83.44%) at advanced tumor stages. A 37-feature 5hmC model was constructed and validated to distinguish lung cancer patients from healthy controls, with areas under the curve (AUCs) of 0.8938 and 0.8476 (sensitivity = 87.50% and 72.73%, specificity = 83.87% and 80.60%) in two distinct validation sets. Furthermore, fragment profiles of cfDNA 5hmC datasets were first explored to develop a 48-feature fragmentation model with good performance (AUC = 0.9257 and 0.822, sensitivity = 87.50% and 78.79%, specificity = 80.65% and 76.12%) in the two validation sets. Another diagnostic model integrating 5hmC signals and fragment profiles improved AUC to 0.9432 and 0.8639 (sensitivity = 87.50% and 83.33%, specificity = 90.30% and 77.61%) in the two validation sets, better than models based on either of them alone and performing well in different stages and lung cancer subtypes. Several 5hmC markers were found to be associated with overall survival (OS) and disease-free survival (DFS) based on gene expression data from The Cancer Genome Atlas (TCGA). Conclusions: Both the 5hmC signal and fragmentation profiles in 5hmC cfDNA data are sensitive and effective in lung cancer detection and could be incorporated into the diagnostic model to achieve good performance, promoting research focused on clinical diagnostic models based on cfDNA 5hmC data.
What problem does this paper attempt to address?