Fine Mapping Regulatory Variants by Characterizing Native CpG Methylation with Nanopore Long Read Sequencing

Yijun Tian,Shannon McDonnell,Lang Wu,Nicholas Larson,Liang Wang
DOI: https://doi.org/10.1101/2024.09.27.614715
2024-09-28
Abstract:5-methylcytosine (5mC) is the most common chemical modification occurring on the CpG sites across the human genome. Bisulfite conversion combined with short-read whole genome sequencing can capture and quantify the modification at single nucleotide resolution. However, the PCR amplification process could lead to duplicative methylation patterns and introduce 5mC detection bias. Additionally, the limited read length also restricts co-methylation analysis between distant CpG sites. The bisulfite conversion process presents a significant challenge for detecting variant-specific methylation due to the destruction of allele information in the sequencing reads. To address these issues, we sought to characterize the human methylation profiling with the nanopore long-read sequencing, aiming to demonstrate its potential for long-range co-methylation analysis with native modification call and intact allele information retained. In this regard, we first analyzed the nanopore demo data in the adaptive sampling sequencing run targeting all human CpG islands. We applied the linkage disequilibrium (LD) R2 to calculate the co-methylation in nanopore data, and further identified 27,875, 50,481, 26,542 and 51,189 methylation haplotype blocks (MHB) in COLO829, COLO829BL, HCC1395 and HCC1395BL cell lines, respectively. Interestingly, while we found that majority of the co-methylation were in a short range (≤200bp), a small portion (1~3%) showed long distance (≥1,000bp), suggesting potential remote regulatory mechanisms across the genome. To further characterize the epigenetic changes related to transcription factor binding, we profiled the 5mC percentage changes surrounding various motif sites in JASPAR collection and found that CTCF and KLF5 binding sites showed reduced methylation, while FOXE1 and ZNF354A sites showed increased methylation. To further investigate the allele-specific 5mCG in the prostate genome, we designed a target region covering methylation quantitative trait loci (mQTL) and genome-wide association study (GWAS) risk germline variants and generated long reads with adaptive sampling run in the 22Rv1 cell line. To identify the allele-specific methylation in the 22Rv1 cell line, we performed long-read based phasing and compared the 5mCG signals between the two haplotypes. As a result, we identified 6,390 haplotype-specific methylated regions in the 22Rv1 cell line (p-MWU ≤ 1e-5 and delta ≥ 50%). By examining haplotype-specific methylated regions near the phasing variants, we identified examples of allele-specific methylated regions that showed allele-specific accessibility in the ATAC-seq data. By further integrating the ATAC-seq data of 22Rv1, we found that methylation levels were negatively correlated with chromatin accessibility at the genome-wide scale. Our study has revealed native methylome profiling while preserving haplotype information, offering a novel approach to uncovering the regulatory mechanisms of the human prostate genome.
Genetics
What problem does this paper attempt to address?