Identifying uncertainty in physical–chemical property estimation with IFSQSAR

Trevor N. Brown,Alessandro Sangion,Jon A. Arnot
DOI: https://doi.org/10.1186/s13321-024-00853-w
2024-06-02
Journal of Cheminformatics
Abstract:This study describes the development and evaluation of six new models for predicting physical–chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water S W and octanol S O ), vapor pressure ( VP ), and the octanol–water ( K OW ), octanol–air ( K OA ), and air–water ( K AW ) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure–Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume ( MV ) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and S W values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for "novel chemicals" in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and S W are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log K OW , log K AW and log K OA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7–1.8 for log VP and log S W .
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
The paper attempts to address the issue of identifying and quantifying uncertainty in the prediction of physicochemical properties of chemical substances. Specifically, the study developed 6 new models to predict physicochemical (PC) properties crucial for chemical hazard, exposure, and risk assessment: water solubility (SW), octanol solubility (SO), vapor pressure (VP), and octanol-water (KOW), octanol-air (KOA), and air-water (KAW) partition coefficients. These models are achieved by integrating experimentally calibrated system parameters and solute descriptors predicted using quantitative structure-activity relationships (QSPR). Additionally, the study developed 2 auxiliary models: one for predicting molar volume (MV) and another for classifying the physical state of chemicals at room temperature (liquid or solid). The focus of the research is on evaluating the models' predictive capabilities for "novel chemicals" using external datasets and testing the models' predictive power and their uncertainties in practical applications. The results indicate that for novel, data-poor chemicals, the prediction error range for octanol-water, air-water, and octanol-air partition coefficients is estimated to be between 0.7 and 1.4; whereas for vapor pressure and water solubility, the prediction error range is between 1.7 and 1.8. This suggests that there remains significant uncertainty in the prediction of certain physicochemical properties.