Abstract:Nm (2´-O-methylation) is one of the most abundant modiﬁcations of mRNAs and non-coding RNAs occurring when a methyl group (–CH3) is added to the 2´ hydroxyl (–OH) of the ribose moiety. This modiﬁcation can appear on any nucleotide (base) regardless of the type of nitrogenous base, because each ribose sugar has a hydroxyl group and so 2´-O-methyl ribose can occur on any base. Nm modification has a great contribution in many biological processes such as the normal functioning of tRNA, the protection of mRNA against degradation by DXO, and the biogenesis and specificity of rRNA. Recently, the single-molecule sequencing techniques for long reads of RNA sequences data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications on the molecule that is being sequenced, but to our knowledge there was only one research attempt that applied this technology to predict the stoichiometry of Nm-modified sites in RNA sequence of yeast cells. To this end, in this paper, we extend this research direction by proposing a bio-computational framework, Nm-Nano for predicting Nm sites in Nanopore direct RNA sequencing reads of human cell lines, which are more complex and larger than yeast. Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites in Nanopore sequencing data, namely the Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with k-mers embedding models. The XGBoost is trained with the features extracted from the modified and unmodified Nanopore signals and their corresponding K-mers resulting from the reported underlying RNA sequence obtained by base-calling, while RF model is trained with the same set of features used to train the XGBoost, in addition to a dense vector representation of RNA k-mers generated by word2vec technique. The results on two benchmark data sets generated from RNA Nanopore sequencing data of Hela and Hek293 human cell lines show a great performance of Nm-Nano. In independent validation testing, Nm-Nano has been able to identify Nm sites with a high accuracy of 93% and 88% using XGBoost and RF with k-mers embedding models respectively by training each model on the Hela benchmark dataset and testing it for identifying Nm sites on Hek293 benchmark dataset. Deploying Nm-Nano to predict Nm sites in Hela cell line revealed that a total of 196 genes were identified as the top frequently Nm-modified genes among all other genes that have been modified by Nm sites in this cell line. The functional and gene set enrichment analysis on these identified genes shows a significant enrichment of a wide range of functional processes in Hela cell line like high confidences (adjusted p-val < 0.05) enriched ontologies that were more representative of Nm modification role in immune response and cellular homeostasis. Similarly, deploying Nm-Nano to predict Nm sites in Hek293 cell line revealed that a total of 176 genes were identified as the top frequently Nm-modified genes in this cell line. The functional and gene set enrichment analysis on these identified genes shows a significant enrichment of a wide range of functional processes in Hek293 cell line like “MHC class 1 protein complex”, “mitotic spindle assembly”, “response to glucocorticoid”, and “nucleocytoplasmic transport”. The source code of Nm-Nano can be

Transcriptome-wide single molecule mapping of 2´-O-Methylation (Nm) sites in Nanopore direct RNA sequencing datasets using the Nm-nano framework

Nm-Nano: A Machine Learning Framework for Transcriptome-Wide Single Molecule Mapping of 2´-O-Methylation (Nm) Sites in Nanopore Direct RNA Sequencing Datasets

Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2 ́-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets

Nm-seq maps 2'-O-methylation sites in human mRNA with base precision

Single base resolution mapping of 2'-O-methylation sites in human mRNA and in 3' terminal ends of small RNAs

An Integrative Platform for Detection of RNA 2′-O-methylation Reveals Its Broad Distribution on Mrna

Nanopore-based native RNA sequencing of human transcriptomes reveals the complexity of mRNA modifications and crosstalk between RNA regulatory features

NanoMUD: Profiling of pseudouridine and N1-methylpseudouridine using Oxford Nanopore direct RNA sequencing

Sequencing-free Analysis of Multiple Methylations on Gene-Specific mRNAs

A comprehensive survey of RNA modifications in a human transcriptome

Single-base Resolution Mapping of 2′-O-methylation Sites by an Exoribonuclease-Enriched Chemical Method

Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing

NmSEER: A Prediction Tool for 2'-O-methylation (nm) Sites Based on Random Forest

2'-O-methylation in RNA: progress, challenges, and future directions

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing

NmSEER V2.0: a prediction tool for 2'-O-methylation sites based on random forest and multi-encoding combination

Identifying N6-Methyladenosine Sites in HepG2 Cell Lines Using Oxford Nanopore Technology

Identifying RNA N6-Methyladenine Sites in Three Species Based on a Markov Model

Comparative analysis of 43 distinct RNA modifications by nanopore tRNA sequencing

Comprehensive Review and Assessment of Computational Methods for Predicting RNA Post-Transcriptional Modification Sites from RNA Sequences

De novo basecalling of m6A modifications at single molecule and single nucleotide resolution