QSPR Study for Prediction of Boiling Points of 2475 Organic Compounds Using Stochastic Gradient Boosting

Jue-hong Zhang,Zai-ming Liu,Wan-rong Liu
DOI: https://doi.org/10.1002/cem.2587
IF: 2.5
2014-01-01
Journal of Chemometrics
Abstract:The normal boiling point is one of the major physicochemical properties used to characterize and identify an organic compound. In this study, the boosting regression tree model was developed to model quantitative structure–property relationship (QSPR) for the boiling points of 2475 compounds with structurally high heterogeneity. Stochastic gradient boosting (SGB) aims at constructing additive regression models by sequentially fitting a simple regression tree model to current “pseudo”‐residuals by least squares at each iteration. The parameters of SGB were optimized using 10‐fold cross‐validation. The best SGB model established using 2D descriptors had an overall Q2 of 0.957, root mean square error of validation of 17.89 for validation set, and RT2 of 0.954, root mean square error of test of 18.19 for test set. Compared to other commonly used modeling methods such as partial least square, classification and regression tree, and random forest, SGB can not only obtain the best predictive ability, but also get more useful insights into the relationship between properties and descriptors for prediction of boiling points, with the help of partial dependence plots. SGB could be a promising tool in the field of QSPR research, especially for the screening of new compounds. Copyright © 2014 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?