Nonlinear QSAR models with high-dimensional descriptor selection and SVR improve toxicity prediction and evaluation of phenols on Photobacterium phosphoreum

Wei Zhou,Shubo Wu,Zhijun Dai,Yuan Chen,Yan Xiang,Jianrong Chen,Chunyu Sun,Qingming Zhou,Zheming Yuan
DOI: https://doi.org/10.1016/j.chemolab.2015.04.010
IF: 4.175
2015-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Assessment of the risk of chemicals is an important task in the environmental protection. In this paper, we developed quantitative structure–activity relationship (QSAR) methods to evaluate the toxicity of phenol to Photobacterium phosphoreum, which is an important indicator for water quality. We first built support vector regression (SVR) model using three descriptors, and the SVR model (t=2) had the highest external prediction ability (MSEext=0.068, Qext2=0.682), about 40% higher than literature model's. Second, to identify more effective descriptors, we applied in-house methods to select descriptors with clear meanings from 2835 descriptors calculated by the PCLIENT and used them to construct the SVR models. Our results showed that our twenty new QSAR models significantly increased the standard regression coefficient on test set (MSEext values ranged from 0.003 to 0.063 and Qext2 values ranged from 0.708 to 0.985). The Y random response permutation test and different splits of training/test datasets also supported the excellent predictive power of the best SVR model. We further evaluated the regression significance of our SVR model and the importance of each single descriptor of the model according to the interpretability analysis. Our work provided useful theoretical understanding of the toxicity of phenol analogues.
What problem does this paper attempt to address?