SNN6mA: Improved DNA N6-methyladenine site prediction using Siamese network-based feature embedding

Xuan Yu,Jun Hu,Ying Zhang
DOI: https://doi.org/10.1016/j.compbiomed.2023.107533
IF: 7.7
2023-10-03
Computers in Biology and Medicine
Abstract:DNA N6-methyladenine (6mA) is one of the most common and abundant modifications, which plays essential roles in various biological processes and cellular functions. Therefore, the accurate identification of DNA 6mA sites is of great importance for a better understanding of its regulatory mechanisms and biological functions. Although significant progress has been made, there still has room for further improvement in 6mA site prediction in DNA sequences. In this study, we report a smart but accurate 6mA predictor, termed as SNN6mA, using Siamese network. To be specific, DNA segments are firstly encoded into feature vectors using the one-hot encoding scheme; then, these original feature vectors are mapped to a low-dimensional embedding space derived from Siamese network to capture more discriminative features ; finally, the obtained low-dimensional features are fed to a fully connected neural network to perform final prediction. Stringent benchmarking tests on the datasets of two species demonstrated that the proposed SNN6mA is superior to the state-of-the-art 6mA predictors. Detailed data analyses show that the major advantage of SNN6mA lies in the utilization of Siamese network, which can map the original features into a low-dimensional embedding space with more discriminative capability. In summary, the proposed SNN6mA is the first attempt to use Siamese network for 6mA site prediction and could be easily extended to predict other types of modifications. The codes and datasets used in the study are freely available at https://github.com/YuXuan-Glasgow/SNN6mA for academic use.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?