Abstract:Nm (2´-O-methylation) is one of the most abundant modiﬁcations of mRNAs and non-coding RNAs occurring when a methyl group (–CH3) is added to the 2´ hydroxyl (–OH) of the ribose moiety. This modiﬁcation can appear on any nucleotide (base) regardless of the type of nitrogenous base, because each ribose sugar has a hydroxyl group and so 2´-O-methyl ribose can occur on any base. Nm modification has a great contribution in many biological processes such as the normal functioning of tRNA, the protection of mRNA against degradation by DXO, and the biogenesis and specificity of rRNA. Recently, the single-molecule sequencing techniques for long reads of RNA sequences data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications on the molecule that is being sequenced, but to our knowledge there was only one research attempt that applied this technology to predict the stoichiometry of Nm-modified sites in RNA sequence of yeast cells. To this end, in this paper, we extend this research direction by proposing a bio-computational framework, Nm-Nano for predicting Nm sites in Nanopore direct RNA sequencing reads of human cell lines, which are more complex and larger than yeast. Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites in Nanopore sequencing data, namely the Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with k-mers embedding models. The XGBoost is trained with the features extracted from the modified and unmodified Nanopore signals and their corresponding K-mers resulting from the reported underlying RNA sequence obtained by base-calling, while RF model is trained with the same set of features used to train the XGBoost, in addition to a dense vector representation of RNA k-mers generated by word2vec technique. The results on two benchmark data sets generated from RNA Nanopore sequencing data of Hela and Hek293 human cell lines show a great performance of Nm-Nano. In independent validation testing, Nm-Nano has been able to identify Nm sites with a high accuracy of 93% and 88% using XGBoost and RF with k-mers embedding models respectively by training each model on the Hela benchmark dataset and testing it for identifying Nm sites on Hek293 benchmark dataset. Deploying Nm-Nano to predict Nm sites in Hela cell line revealed that a total of 196 genes were identified as the top frequently Nm-modified genes among all other genes that have been modified by Nm sites in this cell line. The functional and gene set enrichment analysis on these identified genes shows a significant enrichment of a wide range of functional processes in Hela cell line like high confidences (adjusted p-val < 0.05) enriched ontologies that were more representative of Nm modification role in immune response and cellular homeostasis. Similarly, deploying Nm-Nano to predict Nm sites in Hek293 cell line revealed that a total of 176 genes were identified as the top frequently Nm-modified genes in this cell line. The functional and gene set enrichment analysis on these identified genes shows a significant enrichment of a wide range of functional processes in Hek293 cell line like “MHC class 1 protein complex”, “mitotic spindle assembly”, “response to glucocorticoid”, and “nucleocytoplasmic transport”. The source code of Nm-Nano can be

NanoMUD: Profiling of pseudouridine and N1-methylpseudouridine using Oxford Nanopore direct RNA sequencing

Determining RNA Natural Modifications and Nucleoside Analog-Labeled Sites by a Chemical/Enzyme-Induced Base Mutation Principle

Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing

Transcriptome-wide single molecule mapping of 2´-O-Methylation (Nm) sites in Nanopore direct RNA sequencing datasets using the Nm-nano framework

Detection and Quantification of 5moU RNA Modification from Direct RNA Sequencing Data

Nm-Nano: A Machine Learning Framework for Transcriptome-Wide Single Molecule Mapping of 2´-O-Methylation (Nm) Sites in Nanopore Direct RNA Sequencing Datasets

Interferon inducible pseudouridine modification in human mRNA by quantitative nanopore profiling

Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2 ́-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets

Integrative analysis of nanopore direct RNA sequencing data reveals a role of PUS7-dependent pseudouridylation in regulation of m6A and m5C modifications

Nanopore signal deviations from pseudouridine modifications in RNA are sequence-specific: quantification requires dedicated synthetic controls

An Integrative Platform for Detection of RNA 2′-O-methylation Reveals Its Broad Distribution on Mrna

Integrative analysis of nanopore direct RNA sequencing data reveals a role of PUS7-dependent pseudouridylation in regulation of m 6 A and m 5 C modifications

A Novel Platform of RNA 2′-O-methylation High-Throughput and Site-Specific Quantification Tools Revealed Its Broad Distribution on Mrna

Bisulfite and Nanopore Sequencing for Pseudouridine in RNA

Penguin: A Tool for Predicting Pseudouridine Sites in Direct RNA Nanopore Sequencing Data

Quantitative Profiling of Pseudouridylation Landscape in the Human Transcriptome

Comprehensive Review and Assessment of Computational Methods for Predicting RNA Post-Transcriptional Modification Sites from RNA Sequences

PseU-KeMRF: A Novel Method for Identifying RNA Pseudouridine Sites

Simultaneous nanopore profiling of mRNA m 6 A and pseudouridine reveals translation coordination

A comprehensive survey of RNA modifications in a human transcriptome

N1-Methylpseudouridine and pseudouridine modifications modulate mRNA decoding during translation