Identifying OCRs in cfDNA WGS Data by Correlation Clustering

Farshad Noravesh,Fahimeh Palizban
DOI: https://doi.org/10.48550/arXiv.2202.09618
2022-12-25
Abstract:In the recent decade, the emergence of liquid biopsy has significantly improved cancer monitoring and detection. Dying cells, including those originating from tumors, shed their DNA into the bloodstream and contribute to a pool of circulating fragments called cell-free DNA (cfDNA). Identifying the tissue origin of these DNA fragments from their epigenetic features has implications in various clinical contexts. Open chromatin regions (OCRs) are important epigenetic features of DNA that reflect cell types of origin. Profiling these features by DNase-seq, ATAC-seq, and histone ChIP-seq provides insights into tissue-specific and disease-specific regulatory mechanisms. Integration of genomic and epigenomic features for cancer detection by liquid biopsy has previously been reported. However, many multimodal analyses require large amounts of cfDNA input and/or multiple types of experiments to cover the genomic and epigenomic aspects of a single sample which is cost and time prohibitive. Thus, methods that capture genomic and epigenomic profiles in a single experiment type with low input requirements are of importance. Predicting OCRs from whole genome sequencing (WGS) data is one such approach. Here, we applied a correlation clustering algorithm to predict OCRs. We used local sequencing depth as input to our algorithm. Multiple processing steps were then applied as follows: count normalization, discrete Fourier transform conversion, graph construction, graph cut optimization by linear programming, and clustering. To validate the proposed method, we compared the output of our predictions (OCR vs. non-OCR) with previously validated open chromatin regions related to human blood samples of the ATAC-db. The percentage of overlap between them is greater than 67%.
Genomics,Molecular Networks
What problem does this paper attempt to address?