Abstract:Many DNA methylome profiling methods cannot distinguish between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). Since 5mC typically acts as a repressive mark whereas 5hmC is an intermediate form during active demethylation, the inability to separate their signals could lead to incorrect interpretation of the data. Meanwhile, many analysis pipelines quantify methylation level by the count or ratio of methylated reads, but the proportion of discordant reads (PDR) has recently been proposed to be a better indicator of gene expression level. Is the amount of extra information contained in 5hmC signals and PDR worth the additional experimental and computational costs? Here we combine whole-genome bisulfite sequencing (WGBS) and oxidative WGBS (oxWGBS) data in normal human lung and liver tissues and their paired tumors to investigate the quantitative relationships between gene expression and signals of the two forms of DNA methylation at promoters, transcript bodies, and immediate downstream regions. We find that 5mC and 5hmC signals correlate with gene expression in the same direction in most samples, but considering both types of signals increases the accuracy of expression levels inferred from methylation data by a median of 18.2% as compared to having only standard WGBS data, showing that the two forms of methylation provide complementary information about gene expression. In addition, differential analysis between matched tumor and normal pairs is particularly affected by the superposition of 5mC and 5hmC signals in WGBS data, with at least 25-40% of the differentially methylated regions (DMRs) identified from 5mC signals not detected from WGBS data. We do not find PDR to be more informative about expression levels than ratio of methylated reads, and integrating the two types of methylation features only improves the accuracy of inferred expression levels by at most 9.8%. Our results also confirm previous finding that methylation signals at transcript bodies are more indicative of gene expression levels than promoter methylation signals, and further show that in addition to the first exon, methylation signals at the last exon and internal introns also contain non-redundant information about gene expression. Overall, our study provides concrete data for evaluating the cost effectiveness of some experimental and analysis options in the study of DNA methylation in normal and cancer samples.

D-GPM: A Deep Learning Method for Gene Promoter Methylation Inference.

DeepPGD: A Deep Learning Model for DNA Methylation Prediction Using Temporal Convolution, BiLSTM, and Attention Mechanism

DeepMethylation: a deep learning based framework with GloVe and Transformer encoder for DNA methylation prediction

BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters

Wemics: A Single‐Base Resolution Methylation Quantification Method for Enhanced Prediction of Epigenetic Regulation

Predicting MGMT Promoter Methylation in Diffuse Gliomas Using Deep Learning with Radiomics

New Guidelines for DNA Methylome Studies Regarding 5-Hydroxymethylcytosine for Understanding Transcriptional Regulation.

A Novel Computational Method For Detecting Dna Methylation Sites With Dna Sequence Information And Physicochemical Properties

Noninvasive Lung Cancer Early Detection via Deep Methylation Representation Learning.

A comprehensive comparison of residue-level methylation levels with the regression-based gene-level methylation estimations by ReGear

MethylGPT: a foundation model for the DNA methylome

Higher order methylation features for clustering and prediction in epigenomic studies

DNA Methylation Markers for Pan-Cancer Prediction by Deep Learning

DNA Methylation Loci Identification for Pan-Cancer Early-Stage Diagnosis and Prognosis Using a New Distributed Parallel Partial Least Squares Method.

Comprehensive Analysis of MGMT Promoter Methylation: Correlation with MGMT Expression and Clinical Response in GBM.

Computational pathology-based weakly supervised prediction model for MGMT promoter methylation status in glioblastoma

A novel bisulfite-microfluidic temperature gradient capillary electrophoresis platform for highly sensitive detection of gene promoter methylation

DeepH&M: Estimating single-CpG hydroxymethylation and methylation levels from enrichment and restriction enzyme sequencing methods

DNA Methylation Markers for Diagnosis and Prognosis of Common Cancers.

Mining the Selective Remodeling of DNA Methylation in Promoter Regions to Identify Robust Gene-Level Associations With Phenotype

CDPNet: a radiomic feature learning method with epigenetic application to estimating MGMT promoter methylation status in glioblastoma