MaskDNA-PGD: An innovative deep learning model for detecting DNA methylation by integrating mask sequences and adversarial PGD training as a data augmentation method
Zhiwei Zheng,Nguyen Quoc Khanh Le,Matthew Chin Heng Chua
DOI: https://doi.org/10.1016/j.chemolab.2022.104715
IF: 4.175
2022-11-18
Chemometrics and Intelligent Laboratory Systems
Abstract:DNA methylation occurs in mammals' various diseases, such as cancer and myocardial pain. For a long time, scholars have tried to use machine learning and deep learning to learn the characteristics of DNA sequences with high precision for methylation classifications. However, these studies primarily innovated in encoding and seldom employed deep neural networks for predictions. Hence, this research proposes a framework with random masking and adversarial sample generation in the previous process. Our proposed novel classification model approach composes of convolutional neural network (CNN), bidirectional long short term memory (Bi-LSTM) and attention mechanism as predictors. The benchmark illustrates the automation and advancement of the proposed framework, which can accurately binarily classify diverse DNA methylation. Random masking and adversarial sample generation are proven effective by conducting ablation experiments. In detail, our model achieved the best accuracy of 85.07%, 94.97%, and 92.17% in predicting multi-species N4-methylcytosine, 5-methylcytosine, and N6-methyladenine sites, respectively. Moreover, by comparing performance with two other methods using the same datasets and indexes, the proposed model (namely MaskDNA-PGD) successfully surpasses it. Finally, our MaskDNA-PGD can be freely accessed via https://github.com/willyzzz/MaskDNA-PGD.
automation & control systems,computer science, artificial intelligence,instruments & instrumentation,statistics & probability,mathematics, interdisciplinary applications,chemistry, analytical