Relationship between prediction accuracy and uncertainty in compound potency prediction using deep neural networks and control models

Jannik P. Roth,Jürgen Bajorath
DOI: https://doi.org/10.1038/s41598-024-57135-6
IF: 4.6
2024-03-20
Scientific Reports
Abstract:The assessment of prediction variance or uncertainty contributes to the evaluation of machine learning models. In molecular machine learning, uncertainty quantification is an evolving area of research where currently no standard approaches or general guidelines are available. We have carried out a detailed analysis of deep neural network variants and simple control models for compound potency prediction to study relationships between prediction accuracy and uncertainty. For comparably accurate predictions obtained with models of different complexity, highly variable prediction uncertainties were detected using different metrics. Furthermore, a strong dependence of prediction characteristics and uncertainties on potency levels of test compounds was observed, often leading to over- or under-confident model decisions with respect to the expected variance of predictions. Moreover, neural network models responded very differently to training set modifications. Taken together, our findings indicate that there is only little, if any correlation between compound potency prediction accuracy and uncertainty, especially for deep neural network models, when predictions are assessed on the basis of currently used metrics for uncertainty quantification.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper investigates the relationship between prediction accuracy and uncertainty in compound potency prediction using deep neural networks and control models. The primary problem addressed is the lack of standard approaches or general guidelines for uncertainty quantification (UQ) in molecular machine learning, particularly regarding compound potency prediction. The authors conduct a detailed analysis of deep neural network variants and simple control models to study the relationship between prediction accuracy and uncertainty. Key findings include: 1. **Highly Variable Prediction Uncertainties**: For predictions of similar accuracy obtained with models of different complexities, the authors observe highly variable prediction uncertainties when using different metrics. 2. **Potency Level Dependence**: There is a strong dependence of prediction characteristics and uncertainties on the potency levels of test compounds. This often leads to over- or under-confident model decisions with respect to the expected variance of predictions. 3. **Response to Training Set Modifications**: Neural network models respond very differently to modifications in the training set. For example, changes in the distribution of training data can significantly affect the model's performance and uncertainty estimates. 4. **Correlation Between Accuracy and Uncertainty**: The findings suggest that there is little to no correlation between compound potency prediction accuracy and uncertainty, especially in the context of different model complexities and training set distributions.