Qsar Modeling of E. Coli Promoters with Parameters Selected by Binary Matrix Shuffling Filter

Kai Wang,Li-Feng Wang,Zhi-Jun Dai,Lian-Yang Bai,Zhe-Ming Yuan
DOI: https://doi.org/10.5281/zenodo.5746371
IF: 0.243
2014-01-01
Journal of the Indian Chemical Society
Abstract:The 1123 topological structure parameters of DNA bases were directly used as descriptors to characterize the sequence of 38 E. coli promoters. For the correspondingly generated high-dimensional feature set, the correlation analysis and binary matrix shuffling filter (BMSF) were successively used to remove the redundancy or useless features, and only 20 features were finally reserved, with definite meanings. Based on reserved features and support vector regression (SVR), a quantitative structure-activity relationship (QSAR) model was established for the analysis of 38 E. coli promoters, and the leave-one-out (LOO) prediction accuracy of this model was of 0.838, superior to that of reference model, i.e. partial least squares (PLS). Referring to the SVR interpretation system, the established QSAR model in this work has extremely significant nonlinear regression, and the relationship between real promoter strength and 11 significant reserved features was directly given out. This work provides an efficient tool for the QSAR analysis of promoters and other similar molecular sequences.
What problem does this paper attempt to address?