Prediction of the Heat Capacity for Compounds Based on the Conjugate Gradient and Support Vector Machine Methods

Jingjie Shi,Liping Chen,Wanghua Chen
DOI: https://doi.org/10.1002/cem.2532
IF: 2.5
2013-01-01
Journal of Chemometrics
Abstract:A quantitative structure-property relationship model for prediction of the heat capacity was developed from molecular structures. By using DRAGON 2.1, various kinds of molecular structure descriptors were calculated to represent the molecular structures of compounds, which contain 18 categories of descriptors in total. The novel variable selection method of ant colony optimization (ACO) algorithm was employed to select an optimal subset of descriptors that have significant contribution to the property from a large pool of calculated descriptors. As a result, five descriptors were screened out as input parameters. With the same five descriptors, ACO coupled with the conjugate gradient (CG) method and support vector machine (SVM) method was employed to construct the linear model (ACO-CG) and the nonlinear model (ACO-SVM), respectively. The results showed robust models and small prediction error, and the built models were very satisfying. In addition, the fitting and predicting performances of the ACO-SVM model (squared correlation coefficient, R-train(2)=0.9607, R-test(2)=0.9398) are both better than that of the ACO-CG model (R-train(2)=0.9404, R-test(2)=0.9281). The traditional validation parameters of Q(loo)(2) (internal validation) and Q(ext)(2)(external validation) have been supplemented with two novel parameters r(m)(2) and cR(p)(2) for a stricter test of validation. The developed models could achieve the required values for the novel parameters r(m)(2) (<(r(m)(2))over bar> > 0.5, Delta r(m)(2) < 0.2) and cR(p)(2) (cR(p)(2) > 0.5). From the preceding analysis, it can be concluded that the proposed methods can be successfully used to predict the heat capacity with preselected theoretical descriptors, which can be directly calculated solely from the molecular structure. The applicability domain of the model was assessed by the Williams plot. Copyright (c) 2013 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?