A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation

Waleed Alam,Syed Danish Ali,Hilal Tayara,Kil to Chong
DOI: https://doi.org/10.1109/access.2020.3002995
IF: 3.9
2020-01-01
IEEE Access
Abstract:Post-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques based on handy craft feature extraction methods have been proposed to facilitate the laborious work. However, due to the inefficient feature extraction, these techniques increase the computational complexity and thereby affect the identification accuracy of m6A.This paper proposes a fast and reliable predictive model for the identification of m6A sites. The proposed model is based on the convolutional neural network (CNN) which extracts the most significant features from the RNA sequences encoded by concatenating one-hot and nucleotide chemical properties. The proposed model is applied and tested on multiple species benchmark datasets and evaluated against the state-of-art predictive models. The results indicate that the proposed model achieves high accuracy of 93.6 %, 93.8 %, 85.0 % and 92.5 % on the benchmark datasets of Homo sapiens (H.sapien), Mus musculus (M.musculs), Saccharomyces cerevisiae (S.cerevisiae), and Arabidopsis thaliana (A.thaliana), respectively.The proposed model could be used to facilitate the researcher's community in m6A identification. In addition, an easy to use web server is made available at https://home.jbnu.ac.kr/NSCL/pm6acnn.htm.
What problem does this paper attempt to address?