Denoising Drug Discovery Data for Improved ADMET Property Prediction

Alan Cheng,Yunsie Chung,Matthew Adrian

DOI: https://doi.org/10.26434/chemrxiv-2024-v4jvc

2024-04-22

Abstract:Predicting ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of small molecules is a key task in drug discovery. A major challenge in building better ADMET models is the experimental error inherent in the data. Furthermore, ADMET predictors are typically regression tasks due to the continuous nature of the data. This makes it difficult to apply existing methods as most focus on classification tasks. Here, we develop denoising schemes based on deep learning to address this. We find that the training error can be used to identify the noise in regression tasks while ensemble-based and forgotten event-based metrics fail to detect the noise. The most significant performance increase occurs when the original model is finetuned with the denoised data using training error as the noise detection metric. Our method has the ability to improve models with medium noise and does not degrade the performance of models with noise outside this range. To our knowledge, our denoising scheme is the first to improve model performance for ADMET data and has implications for improving models for experimental assay data in general.

Chemistry

What problem does this paper attempt to address?

This paper aims to address the issue of data noise in drug discovery in order to improve the accuracy of predicting the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties of small molecules. Predicting ADMET properties is a key task in the drug discovery process, but there are inherent errors in experimental data, posing challenges for building better prediction models. Due to the continuity of the data, ADMET prediction is usually a regression task, while most existing methods focus on classification tasks and are not applicable in this case. The paper proposes a deep learning-based data denoising method, which identifies noise in regression tasks by training errors, while metrics based on sets and forgetting events are unable to effectively detect noise. The study found that using training errors as a noise detection metric and fine-tuning the original model can significantly improve performance. This method has an improvement effect on data models with moderate noise levels and does not reduce the performance of models beyond this noise level range. The paper also discusses the impact of data imbalance, dataset size, and experimental errors in the test set on the denoising solution, and investigates whether noise in multi-task models spreads between different tasks and affects performance. To the best of the authors' knowledge, this is the first denoising solution proposed for ADMET data in drug discovery, which can improve the predictive performance of regression tasks.

Denoising Drug Discovery Data for Improved ADMET Property Prediction

Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models

Noise is the Fatal Poison: A Noise-aware Network for Noisy Dataset Classification

Pre-training with fractional denoising to enhance molecular property prediction

LDN-RC: a Lightweight Denoising Network with Residual Connection to Improve Adversarial Robustness

DENOISING: Dynamic enhancement and noise overcoming in multimodal neural observations via high-density CMOS-based biosensors

Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets

SADR: Self-supervised Graph Learning with Adaptive Denoising for Drug Repositioning

Are we fitting data or noise? Analysing the predictive power of commonly used datasets in drug-, materials-, and molecular-discovery.

Research Advanced in Image Denoising Based on Deep Learning

Step Change Improvement in ADMET Prediction with PotentialNet Deep Featurization

Performance Insights for Small Molecule Drug Discovery Models: Data Scaling, Multitasking, and Generalization

The role of noise in denoising models for anomaly detection in medical images

A Personalized Deep Learning Denoising Strategy for Low-Count PET Images

Deep Joint Denoising and Detection for Enhanced Intracellular Particle Analysis

Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Drug-target interactions prediction using marginalized denoising model on heterogeneous networks

Denoising neural networks for magnetic resonance spectroscopy

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data