High accuracy methylation identification tools on single molecular level for PacBio HiFi data

Ying Chen,Bo Wu,Yu-Ying Ding,Long-Jian Niu,Xin Bai,Zhuo-Bin Lin,Chuan-Le Xiao
DOI: https://doi.org/10.1101/2024.08.14.607879
2024-08-18
Abstract:PacBio’s Circular Consensus Sequencing (CCS) allows us to obtain highly accurate bases and simultaneously determine the methylation states of individual molecules. However, existing CCS-based methods for 5mC detection have low accuracy (<90% on most datasets) at the single-molecule level and can produce inaccurate methylation patterns. These methods rely on the information from 21 bp contexts surrounding the target CpGs and have over 29% low-confidence (<75% accuracy) calls at CpGs with less distinguishable signals. We hypothesize that incorporating CpG methylation correlation information at the single-molecule level could improve the methylation calls on low-confidence CpGs. Here, we present a novel deep graph convolutional network (hifimeth) that uses 400 bp context in CCS-based 5mC calling and show that its improved performance is mainly due to the inclusion of more neighboring CpGs in contexts. Hifimeth achieves an average single-molecule accuracy of 94.7% and an average F1 score of 94.2%, 5.5% and 5.9% higher than the previous state-of-the-art method, respectively. Hifimeth-based methylation frequency quantification by read counting outperforms previous methods on all human and zebrafish datasets tested. The results also show that hifimeth’s high-accuracy calls can reveal complex single-molecule methylation patterns, either related to haplotypes or repeat regions, with up to single-motif resolution.
Bioinformatics
What problem does this paper attempt to address?