The SAMPL6 challenge on predicting octanol–water partition coefficients from EC-RISM theory
Nicolas Tielker,Daniel Tomazic,Lukas Eberlein,Stefan Güssregen,Stefan M. Kast
DOI: https://doi.org/10.1007/s10822-020-00283-4
2020-01-24
Abstract:Abstract Results are reported for octanol–water partition coefficients (log P ) of the neutral states of drug-like molecules provided during the SAMPL6 (Statistical Assessment of Modeling of Proteins and Ligands) blind prediction challenge from applying the “embedded cluster reference interaction site model” (EC-RISM) as a solvation model for quantum-chemical calculations. Following the strategy outlined during earlier SAMPL challenges we first train 1- and 2-parameter water-free (“dry”) and water-saturated (“wet”) models for n -octanol solvation Gibbs energies with respect to experimental values from the “Minnesota Solvation Database” (MNSOL), yielding a root mean square error (RMSE) of 1.5 kcal mol −1 for the best-performing 2-parameter wet model, while the optimal water model developed for the p K a part of the SAMPL6 challenge is kept unchanged (RMSE 1.6 kcal mol −1 for neutral compounds from a model trained on both neutral and ionic species). Applying these models to the blind prediction set yields a log P RMSE of less than 0.5 for our best model (2-parameters, wet). Further analysis of our results reveals that a single compound is responsible for most of the error, SM15, without which the RMSE drops to 0.2. Since this is the only compound in the challenge dataset with a hydroxyl group we investigate other alcohols for which Gibbs energy of solvation data for both water and n -octanol are available in the MNSOL database to demonstrate a systematic cause of error and to discuss strategies for improvement.
biochemistry & molecular biology,biophysics,computer science, interdisciplinary applications
What problem does this paper attempt to address?