Endogenous labeling empowers accurate detection of m A from single long reads of direct RNA sequencing

Wenbing Guo,Zhijun Ren,Xiang Huang,Jialiang He,Jie Zhang,Zehong Wu,Yang Guo,Zijun Zhang,Yixian Cun,Jinkai Wang
DOI: https://doi.org/10.1101/2024.01.30.577990
2024-01-31
Abstract:Although plenty of machine learning models have been developed to detect m A RNA modification sites using the electric current signals of ONT direct RNA sequencing (DRS) reads, the landscape of m A on different RNA isoforms is still a mystery due to their limited capacity to distinguish the m A on individual long reads and RNA isoforms. The primary challenge in training the model with single-read accuracy is the difficulty of obtaining the training data from individual DRS reads that comprehensively represent the m A on endogenous RNAs. Here, we endogenously label the methylated m A sites on single ONT DRS reads by APOBEC1-YTH induced C-to-U mutations, strategically positioned 10-100 nt away from the known m A sites on the same reads. Adopting a semi-supervised leaning strategy, we obtain 700,438 reliable 5-mer single-read level m A signals, providing a comprehensive representation of m A on endogenous RNAs. Leveraging this dataset, we develop m6Aiso, a deep residual neural network model that not only accurately identifies and quantifies known m A sites but also reveals unknown, subtly methylated m A sites responsive to METTL3 depletion. Analyzing m6Aiso-determined m A on single reads and isoforms uncovers distance-dependent linkages of m A sites along single molecules, as well as differential methylation of identical m A sites on different isoforms. Moreover, we find wide-spread functionally important dynamic changes of m A sites on specific isoforms during epithelial-mesenchymal transition (EMT). The pivotal utilization of the endogenous labeling strategy empowers m6Aiso to achieve remarkable precision in pinpointing m A on individual molecules, underscores its effectiveness in elucidating the intricate dynamics and complexities of m A across RNA isoforms.
Bioinformatics
What problem does this paper attempt to address?