A Method for Yeast Promoter Strength Prediction Based on a Branched CNN Feature Extractor

Wenfa Wu,Min Liu
DOI: https://doi.org/10.1145/3543377.3543400
2022-01-01
Abstract:Promoters with desired strengths are of great importance in metabolic engineering and synthetic biology applications. However, experimental identification of them through the existing mutagenesis-based methods is time-consuming and laborious. With the accumulation of sequence data in the post-genomic era and the development of machine learning, it is imperative to develop computational models for precisely predicting the strengths from promoter sequences as these models are expected to be useful for rapid generation of large promoter sets with desired strengths. In this paper, we proposed a hybrid model within multiple modalities to predict the yeast promoter strength. First, according to the prior biological knowledge, we encoded promoter sequences into three modalities, namely nucleotide composition, trinucleotide composition, dinucleotide structural property, and introduced three parallel branches of the convolution network, which were used to extract the hidden representations of promoter sequences. Then, we merged the generated feature vectors corresponding to the three encoding forms for building the final model. We conducted experiments on a yeast promoter benchmark dataset, and achieved 67.2% accuracy, 68.65% sensitivity, 83.23% specificity, and 82.19% AUC value. Moreover, we compared our model with the existing promoter strength prediction tools, indicating that our method outperformed them, verifying the effectiveness of the proposed method. The runnable source codes of our proposed method can be found at https://github.com/WWF-coding/Yeast_Promoter_Strength_Prediction.
What problem does this paper attempt to address?