Abstract:Nm (2´-O-methylation) is one of the most abundant modiﬁcations of mRNAs and non-coding RNAs occurring when a methyl group (–CH3) is added to the 2´ hydroxyl (–OH) of the ribose moiety. This modiﬁcation can appear on any nucleotide (base) regardless of the type of nitrogenous base, because each ribose sugar has a hydroxyl group and so 2´-O-methyl ribose can occur on any base. Nm modification has a great contribution in many biological processes such as the normal functioning of tRNA, the protection of mRNA against degradation by DXO, and the biogenesis and specificity of rRNA. Recently, the single-molecule sequencing techniques for long reads of RNA sequences data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications on the molecule that is being sequenced, but to our knowledge there was only one research attempt that applied this technology to predict the stoichiometry of Nm-modified sites in RNA sequence of yeast cells. To this end, in this paper, we extend this research direction by proposing a bio-computational framework, Nm-Nano for predicting Nm sites in Nanopore direct RNA sequencing reads of human cell lines, which are more complex and larger than yeast. Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites in Nanopore sequencing data, namely the Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with k-mers embedding models. The XGBoost is trained with the features extracted from the modified and unmodified Nanopore signals and their corresponding K-mers resulting from the reported underlying RNA sequence obtained by base-calling, while RF model is trained with the same set of features used to train the XGBoost, in addition to a dense vector representation of RNA k-mers generated by word2vec technique. The results on two benchmark data sets generated from RNA Nanopore sequencing data of Hela and Hek293 human cell lines show a great performance of Nm-Nano. In independent validation testing, Nm-Nano has been able to identify Nm sites with a high accuracy of 93% and 88% using XGBoost and RF with k-mers embedding models respectively by training each model on the Hela benchmark dataset and testing it for identifying Nm sites on Hek293 benchmark dataset. Deploying Nm-Nano to predict Nm sites in Hela cell line revealed that a total of 196 genes were identified as the top frequently Nm-modified genes among all other genes that have been modified by Nm sites in this cell line. The functional and gene set enrichment analysis on these identified genes shows a significant enrichment of a wide range of functional processes in Hela cell line like high confidences (adjusted p-val < 0.05) enriched ontologies that were more representative of Nm modification role in immune response and cellular homeostasis. Similarly, deploying Nm-Nano to predict Nm sites in Hek293 cell line revealed that a total of 176 genes were identified as the top frequently Nm-modified genes in this cell line. The functional and gene set enrichment analysis on these identified genes shows a significant enrichment of a wide range of functional processes in Hek293 cell line like “MHC class 1 protein complex”, “mitotic spindle assembly”, “response to glucocorticoid”, and “nucleocytoplasmic transport”. The source code of Nm-Nano can be

Nm-Nano: A Machine Learning Framework for Transcriptome-Wide Single Molecule Mapping of 2´-O-Methylation (Nm) Sites in Nanopore Direct RNA Sequencing Datasets

Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2 ́-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets

Transcriptome-wide single molecule mapping of 2´-O-Methylation (Nm) sites in Nanopore direct RNA sequencing datasets using the Nm-nano framework

Nm-seq maps 2'-O-methylation sites in human mRNA with base precision

Viable phenotype of ILNEB syndrome without nephrotic impairment in siblings heterozygous for unreported integrin alpha3 mutations

An Integrative Platform for Detection of RNA 2′-O-methylation Reveals Its Broad Distribution on Mrna

NmSEER: A Prediction Tool for 2'-O-methylation (nm) Sites Based on Random Forest

Single-base Resolution Mapping of 2′-O-methylation Sites by an Exoribonuclease-Enriched Chemical Method

Androgen-induced accretion of ribonucleic acids in kidney of female mouse (Mus musculus).

NmSEER V2.0: a prediction tool for 2'-O-methylation sites based on random forest and multi-encoding combination

Nanopore-based native RNA sequencing of human transcriptomes reveals the complexity of mRNA modifications and crosstalk between RNA regulatory features

2'-O-methylation in RNA: progress, challenges, and future directions

A comprehensive survey of RNA modifications in a human transcriptome

Detecting m6A RNA modification from nanopore sequencing using a semi-supervised learning framework

A mapping-free NLP-based technique for sequence search in Nanopore long-reads

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing

A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing

Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing

Sequencing-free Analysis of Multiple Methylations on Gene-Specific mRNAs

Salvianolic Acid B Protects the Memory Functions against Transient Cerebral Ischemia in Mice

Comparative analysis of 43 distinct RNA modifications by nanopore tRNA sequencing