Prediction of probability distributions of molecular properties: towards more efficient virtual screening and better understanding of compound representations

Jarosław Duda,Sabina Podlewska
DOI: https://doi.org/10.1007/s11030-022-10589-0
Abstract:Various in silico approaches to predict activity and properties of chemical compounds constitute nowadays the basis of computer-aided drug design. While there is a general focus on the predictions of values, mathematically more appropriate is the prognosis of probability distributions, which offers additional possibilities, such as the evaluation of uncertainty, higher moments, and quantiles. In this study, we applied the Hierarchical Correlation Reconstruction approach to assess several ADMET properties of chemical compounds. It uses multiple linear regression to independently assess multiple moments, which are then finally combined into predicted probability distribution. The method enables inexpensive selection of compounds with properties nearly certain to fall into the particular range during virtual screening and automatic rejection of predictions characterized by high rate of uncertainty; however, unlike to the currently used virtual screening methods, it focuses on the prediction of the property distribution, not its actual value. Moreover, the presented protocol enables detection of structural features, which should be carefully considered when optimizing compounds towards particular property, as well as it provides deeper understanding of the examined compound representations.
What problem does this paper attempt to address?