Germline CpG methylation signatures in the human population inferred from genetic polymorphism

Yichen Si,Hyun Min Kang,Sebastian Zöllner
DOI: https://doi.org/10.1101/2023.03.24.534151
2024-01-03
Abstract:Understanding the DNA methylation patterns in the human genome is a key step to decipher gene regulatory mechanisms and model mutation rate heterogeneity in the human genome. We analyzed existing whole genome bisulfite sequencing (WGBS) data across tissues and large genetic variation catalogs and observed that 93.2% CpGs hyper-methylated in sperm are polymorphic. Moreover, methylation status of CpGs is spatially correlated, as 94% of CpG pairs within 1kb share the same methylation status. Leveraging only these properties, we infer germline CpG methylation in the human population using a new method, Methylation Hidden Markov Model (MHMM), and the polymorphism data from TOPMed. Our inference is orthogonal to WGBS-based experimental results; still we observed 90% concordance with human sperm WGBS while overcoming several challenges in that data: We inferred methylation status for ∼ 721, 000 CpG sites that were missing from WGBS due to low coverage, and show that 42.2% of CpGs with allele frequency > 5% are hyper-methylated in the population but could not be captured in WGBS due to sample genetic variation. Our results provide a unique resource for CpG methylation levels in germline cells complementary to the existing WGBS-based measures, and can thus be leveraged to enhance analysis such as annotating regulatory and inactivated genomic regions in the germline.
Genetics
What problem does this paper attempt to address?