Enhancing ADMET Property Models Performance through Combinatorial Fusion Analysis.

suman sirimulla,D. Frank Hsu,Nan jiang,Tudor Oprea,Mohammed Quazi,Christina Schweikert
DOI: https://doi.org/10.26434/chemrxiv-2023-dh70x
2023-11-29
Abstract:Accurate prediction of Absorption, Distribution, Metabolism, Excretion, and Toxicity(ADMET) properties is crucial for drug discovery and development. However, existing computational models for ADMET predictions often lack generalizability and robustness. In this paper, we deployed a Combinatorial Fusion Analysis (CFA) to enhance the performance of ADMET models. Utilizing ADMET benchmark datasets on Therapeutics Data Commons (TDC), we conduct a comprehensive evaluation against traditional and state-of-the-art models. CFA models show superior performance compared to most of the individual models. The CFA model architecture and the performance of CFA models on TDC and other internal datasets are discussed. This significant enhancement suggests that CFA is a viable tool for improving ADMET model performance, promising faster and more cost-effective drug development pipelines. The code and models trained are available on GitHub at https://github.com/FLIDM/CFA4DD.
Chemistry
What problem does this paper attempt to address?
The main objective of this paper is to enhance the performance of absorption, distribution, metabolism, excretion, and toxicity (ADMET) property prediction models in drug development through Combinatorial Fusion Analysis (CFA). Existing ADMET prediction models often lack generalization ability and robustness, while the CFA method aims to improve the predictive accuracy and robustness of these models by integrating the strengths of multiple models or descriptors. Specifically, the authors utilized benchmark datasets provided by the Therapeutics Data Commons (TDC) to evaluate the effectiveness of the CFA method and compared it with traditional and state-of-the-art models. The results showed that the CFA models outperformed most individual models across multiple datasets, indicating that CFA is a feasible and effective tool that can significantly improve the performance of ADMET models, thereby promoting a faster and more cost-effective drug development process. To achieve this goal, the authors employed different molecular feature representation methods (such as Morgan circular fingerprints, RDKIT 2D molecular descriptors, etc.) and various machine learning algorithms as base models (including gradient boosting trees, random forests, support vector machines, AdaBoost, and convolutional neural networks). By fusing these base models through the CFA method, the authors demonstrated that the CFA-optimized models could achieve better predictive performance on multiple ADMET-related datasets. Additionally, the paper discussed the differences between rank combination and score combination, as well as the role of cognitive diversity in CFA.