An Evolved Transformer Model for ADME/Tox Prediction

Changheng Shao,Fengjing Shao,Song Huang,Rencheng Sun,Tao Zhang
DOI: https://doi.org/10.3390/electronics13030624
IF: 2.9
2024-02-03
Electronics
Abstract:Drug discovery aims to keep fueling new medicines to cure and palliate many ailments and some untreatable diseases that still afflict humanity. The ADME/Tox (absorption, distribution, metabolism, excretion/toxicity) properties of candidate drug molecules are key factors that determine the safety, uptake, elimination, metabolic behavior and effectiveness of drug research and development. The predictive technique of ADME/Tox drastically reduces the fraction of pharmaceutics-related failure in the early stages of drug development. Driven by the expectation of accelerated timelines, reduced costs and the potential to reveal hidden insights from vast datasets, artificial intelligence techniques such as Graphormer are showing increasing promise and usefulness to perform custom models for molecule modeling tasks. However, Graphormer and other transformer-based models do not consider the molecular fingerprint, as well as the physicochemicals that have been proved effective in traditional computational drug research. Here, we propose an enhanced model based on Graphormer which uses a tree model that fully integrates some known information and achieves better prediction and interpretability. More importantly, the model achieves new state-of-the-art results on ADME/Tox properties prediction benchmarks, surpassing several challenging models. Experimental results demonstrate an average SMAPE (Symmetric Mean Absolute Percentage Error) of 18.9 and a PCC (Pearson Correlation Coefficient) of 0.86 on ADME/Tox prediction test sets. These findings highlight the efficacy of our approach and its potential to enhance drug discovery processes. By leveraging the strengths of Graphormer and incorporating additional molecular descriptors, our model offers improved predictive capabilities, thus contributing to the advancement of ADME/Tox prediction in drug development. The integration of various information sources further enables better interpretability, aiding researchers in understanding the underlying factors influencing the predictions. Overall, our work demonstrates the potential of our enhanced model to expedite drug discovery, reduce costs, and enhance the success rate of our pharmaceutical development efforts.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The paper aims to address the issue of predicting ADME/Tox (Absorption, Distribution, Metabolism, Excretion/Toxicity) properties in the drug discovery process. ADME/Tox properties are crucial factors in evaluating the safety, pharmacokinetic behavior, and efficacy of candidate drugs, and are essential for the success of drug development. The current challenge lies in the need for efficient methods to screen and optimize drug molecules to reduce the failure rate due to poor ADME/Tox properties in the later stages of drug development. The paper proposes an improved version of the Graphormer model, which combines a tree model for secondary training to better integrate known information, such as molecular fingerprints and physicochemical properties, which have proven effective in traditional computational drug research. This approach not only improves prediction accuracy but also enhances the interpretability of the model, helping researchers understand the fundamental factors influencing the prediction results. Specifically, the model first generates molecular embeddings through the Graphormer model, then combines these embeddings with molecular fingerprints and key physicochemical properties, and finally uses the CatBoost algorithm for training to further improve prediction performance. Experimental results show that the model achieved excellent performance on the ADME/Tox property prediction benchmark dataset, with an average SMAPE (Symmetric Mean Absolute Percentage Error) of 18.9% and a PCC (Pearson Correlation Coefficient) of 0.86. This indicates that the method has potential application value in accelerating the drug discovery process, reducing costs, and increasing the success rate of drug development.