Genomic Origin, Fragmentomics, and Transcriptional Properties of Long Cell-Free DNA Molecules in Human Plasma.

Huiwen Che,Peiyong Jiang,L. Y. Lois Choy,Suk Hang Cheng,Wenlei Peng,Rebecca W. Y. Chan,Jing Liu,Qing Zhou,W. K. Jacky Lam,Stephanie C. Y. Yu,So Ling Lau,Tak Y. Leung,John Wong,Vincent Wai-Sun Wong,Grace L. H. Wong,Stephen L. Chan,K. C. Allen Chan,Y. M. Dennis Lo
DOI: https://doi.org/10.1101/gr.278556.123
IF: 9.438
2024-01-01
Genome Research
Abstract:Recent studies have revealed an unexplored population of long cell-free DNA (cfDNA) molecules in human plasma using long-read sequencing technologies. However, the biological properties of long cfDNA molecules (>500 bp) remain largely unknown. To this end, we have investigated the origins of long cfDNA molecules from different genomic elements. Analysis of plasma cfDNA using long-read sequencing reveals an uneven distribution of long molecules from across the genome. Long cfDNA molecules show overrepresentation in euchromatic regions of the genome, in sharp contrast to short DNA molecules. We observe a stronger relationship between the abundance of long molecules and mRNA gene expression levels, compared with short molecules (Pearson's r = 0.71 vs. -0.14). Moreover, long and short molecules show distinct fragmentation patterns surrounding CpG sites. Leveraging the cleavage preferences surrounding CpG sites, the combined cleavage ratios of long and short molecules can differentiate patients with hepatocellular carcinoma (HCC) from non-HCC subjects (AUC = 0.87). We also investigated knockout mice in which selected nuclease genes had been inactivated in comparison with wild-type mice. The proportion of long molecules originating from transcription start sites are lower in Dffb-deficient mice but higher in Dnase1l3-deficient mice compared with that of wild-type mice. This work thus provides new insights into the biological properties and potential clinical applications of long cfDNA molecules.
What problem does this paper attempt to address?