Abstract:Abstract Background As a common and abundant RNA methylation modification, N6-methyladenosine (m 6 A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m 6 A methylation sites has become a hot topic. Most biological methods rely on high-throughput sequencing technology, which places great demands on the sequencing library preparation and data analysis. Thus, various machine learning methods have been proposed to extract various types of features based on sequences, then occupied conventional classifiers, such as SVM, RF, etc., for m 6 A methylation site identification. However, the identification performance relies heavily on the extracted features, which still need to be improved. Results This paper mainly studies feature extraction and classification of m 6 A methylation sites in a natural language processing way, which manages to organically integrate the feature extraction and classification simultaneously, with consideration of upstream and downstream information of m 6 A sites. One-hot, RNA word embedding, and Word2vec are adopted to depict sites from the perspectives of the base as well as its upstream and downstream sequence. The BiLSTM model, a well-known sequence model, was then constructed to discriminate the sequences with potential m 6 A sites. Since the above-mentioned three feature extraction methods focus on different perspectives of m 6 A sites, an ensemble deep learning predictor (EDLm 6 APred) was finally constructed for m 6 A site prediction. Experimental results on human and mouse data sets show that EDLm 6 APred outperforms the other single ones, indicating that base, upstream, and downstream information are all essential for m 6 A site detection. Compared with the existing m 6 A methylation site prediction models without genomic features, EDLm 6 APred obtains 86.6% of the area under receiver operating curve on the human data sets, indicating the effectiveness of sequential modeling on RNA. To maximize user convenience, a webserver was developed as an implementation of EDLm 6 APred and made publicly available at www.xjtlu.edu.cn/biologicalsciences/EDLm6APred . Conclusions Our proposed EDLm 6 APred method is a reliable predictor for m 6 A methylation sites.

MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction

Enhanced 5mC-Methylation-Site Recognition in DNA Sequences using Token Classification and a Domain-specific Loss Function

DeepMethylation: a deep learning based framework with GloVe and Transformer encoder for DNA methylation prediction

Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species

Predicting Differentially Methylated Cytosines in TET and DNMT3 Knockout Mutants via a Large Language Model

DeepPGD: A Deep Learning Model for DNA Methylation Prediction Using Temporal Convolution, BiLSTM, and Attention Mechanism

EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species

Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites

Wemics: A Single‐Base Resolution Methylation Quantification Method for Enhanced Prediction of Epigenetic Regulation

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction

BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters

MethylGPT: a foundation model for the DNA methylome

Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm

Comparative evaluation and analysis of DNA N4-methylcytosine methylation sites using deep learning

On the application of BERT models for nanopore methylation detection

MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction

iResNetDM: An interpretable deep learning approach for four types of DNA methylation modification prediction

EDLm6APred: ensemble deep learning approach for mRNA m6A site prediction

Predicting the effect of non-coding mutations on single-cell DNA methylation using deep learning

PSATF-6mA: an integrated learning fusion feature-encoded DNA-6 mA methylcytosine modification site recognition model based on attentional mechanisms

MaskDNA-PGD: An innovative deep learning model for detecting DNA methylation by integrating mask sequences and adversarial PGD training as a data augmentation method