Promoter Prediction in Agrobacterium tumefaciens Strain C58 by Using Artificial Intelligence Strategies

Hasan Zulfiqar,Ramala Masood Ahmad,Ali Raza,Sana Shahzad,Hao Lin
DOI: https://doi.org/10.1007/978-1-0716-4063-0_2
Abstract:Promoters are the genomic regions upstream of genes that RNA polymerase binds in order to initiate gene transcription. Understanding the regulation of gene expression depends on being able to identify promoters, because they are the most important component of gene expression. Agrobacterium tumefaciens (A. tumefaciens) strain C58 was the subject of this study with the goal of creating a machine learning-based model to predict promoters. In this study, nucleotide density (ND), k-mer, and one-hot were used to encode the promoter sequence. Support vector machine (SVM) on fivefold cross-validation with incremental feature selection (IFS) was used to optimize the generated features. These improved characteristics were then used to distinguish promoter sequences by feeding them into the random forest (RF) classifier. Tenfold cross-validation (CV) analysis revealed that the projected model has the ability to produce an accuracy of 84.22%.
What problem does this paper attempt to address?