Abstract:N4-methylcytosine (4mC) is a critical epigenetic modification that plays a pivotal role in the regulation of a multitude of biological processes, including gene expression, DNA replication, and cellular differentiation. Traditional experimental methods for detecting DNA N4-methylcytosine sites are time-consuming, labor-intensive, and costly, making them unsuitable for large-scale or high-throughput research. Computational methods for identifying DNA N4-methylcytosine sites enable the rapid and cost-effective analysis of DNA 4mC sites across entire genomes. In this study, we focus on the identification of DNA 4mC sites in the mouse genome. Although there are already some computational methods that can predict DNA 4mC sites in the mouse genome, there is still significant room for improvement in accurately predicting them due to their inability to fully capture the multifaceted characteristics of DNA sequences. To address this issue, we propose a new deep learning predictor called Mus4mCPred, which utilizes multi-view feature learning and deep hybrid networks for accurately predicting DNA 4mC sites in the mouse genome. The predictor Mus4mCPred firstly employed different encoding methods to extract the feature vectors of DNA sequences, then input these features generated by different encoding methods into various hybrid deep learning models for the learning and extraction of more sophisticated representations of these features, and finally fused the extracted multi-view features to serve as the final features for DNA 4mC site prediction in the mouse genome. Multi-view features enabled the more comprehensive capture of data characteristics, enhancing the feature representation of DNA sequences. The independent test results showed that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthews' correlation coefficient (MCC) were 0.7688, 0.9375, 0.8531, and 0.7165, respectively. The predictor Mus4mCPred outperformed other state-of-the-art methods, achieving the accurate identification of 4mC sites in the mouse genome.

Distance-based support vector machine to predict DNA N6-methyladenine modification

Prediction of nucleic acid-binding proteins using support vector machines

Deep6mAPred: A CNN and Bi-LSTM-based deep learning method for predicting DNA N6-methyladenosine sites across plant species

Prediction of Functional Class of Proteins and Peptides Irrespective of Sequence Homology by Support Vector Machines.

SoftVoting6mA: An improved ensemble-based method for predicting DNA N6-methyladenine sites in cross-species genomes

Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites

SNN6mA: Improved DNA N6-methyladenine site prediction using Siamese network-based feature embedding

PSATF-6mA: an integrated learning fusion feature-encoded DNA-6 mA methylcytosine modification site recognition model based on attentional mechanisms

i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting

An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis

Ense-i6mA: Identification of DNA N6-methyl-adenine Sites Using XGB-RFE Feature Se-lection and Ensemble Machine Learning

Identification of DNA adduct formation of small molecules by molecular descriptors and machine learning methods

Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features

Mus4mCPred: Accurate Identification of DNA N4-Methylcytosine Sites in Mouse Genome Using Multi-View Feature Learning and Deep Hybrid Network

MM-6mAPred: Identifying DNA N6-methyladenine Sites Based on Markov Model

Identifying RNA N6-Methyladenine Sites in Three Species Based on a Markov Model

DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2'-O-Dimethyladenosine Sites in RNA Sequences

A deep multiple kernel learning-based higher-order fuzzy inference system for identifying DNA N4-methylcytosine sites

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation

Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites

Identification of DNA N6-methyladenine sites by integration of sequence features