Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network

Lu Wang,Ping Wan
DOI: https://doi.org/10.11648/j.cbb.20180602.11
2018-01-01
Computational Biology and Bioinformatics
Abstract:Promoters are significant cis-acting elements in genomes and play important roles in gene regulation. Each gene is regulated by a specific type of promoter, so determining the type of promoter for regulation of a gene is crucial to explore the gene function. Although some computational methods to predict promoters have been proposed, their performances are not satisfying. Convolutional neural network (CNN) is a powerful model in deep learning, it has been applied in bioinformatics in recent years. To improve the performance of promoter prediction, in this study, six types of Escherichia coli K-12 promoter DNA sequences were collected from the RegulonDB database, and constructed a CNN model to predict promoters using the Keras platform. The CNN model is composed of two convolutional layers, three dropout layers, four batch normalization layers and one hidden layer. To evaluate the performances of the CNN model, the 10-fold cross-validation and the receiver operating characteristic (ROC) curve plotting were performed. The results show, the accuracies of predictions for promoters sigma 24, sigma 28, sigma 32, sigma 38, sigma 54 and sigma 70 are 94%, 97%, 95%, 95%, 97% and 83%, respectively. The convolutional neural network model achieves the highest accuracy in promoter prediction up to now. In conclusion, CNN is the best model in promoter prediction, and it will be a promising model both in DNA and protein sequence analysis.
What problem does this paper attempt to address?