MolToxPred: small molecule toxicity prediction using machine learning approach

Anjali Setiya,Vinod Jani,Uddhavesh Sonavane,Rajendra Joshi
DOI: https://doi.org/10.1039/d3ra07322j
IF: 4.036
2024-01-31
RSC Advances
Abstract:Different types of chemicals and products may exhibit various health risks when administered into the human body. For toxicity reasons, the number of new drugs entering the market through the conventional drug development process has been reduced over the years. However, with the advent of big data and artificial intelligence, machine learning techniques have emerged as a potential solution for predicting toxicity and ensuring efficient drug development and chemical safety. An ML model for toxicity prediction can reduce experimental costs and time while addressing ethical concerns by drastically reducing the need for animals and clinical trials. Herein, MolToxPred, an ML-based tool, has been developed using a stacked model approach to predict the potential toxicity of small molecules and metabolites. The stacked model consists of random forest, multi-layer perceptron, and LightGBM as base classifiers and Logistic Regression as the meta classifier. For training and validation purposes, a comprehensive set of toxic and non-toxic molecules is curated. Different structural and physicochemical-based features in the form of molecular descriptors and fingerprints were employed. MolToxPred utilizes a comprehensive feature selection process and optimizes its hyperparameters through Bayesian optimization with stratified 5-fold cross-validation. In the evaluation phase, MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on an external validation set. The McNemar test was used as the post-hoc test to determine if the stacked models' performance was significantly different compared to the base learners. The developed stacked model outperformed its base classifiers and an existing tool in the literature, reaffirming its better performance. The hypothesis is that the incorporation of a diverse set of data, the subsequent feature selection, and a stacked ensemble approach give MolToxPred the edge over other methods. In addition to this, an attempt has been made to identify structural alerts responsible for endpoints of the Tox21 data to determine the association of a molecule with a plausible downstream pathway of action. MolToxPred may be helpful for drug discovery and regulatory pipelines in pharmaceutical and other industries for in silico toxicity prediction of small molecule candidates.
chemistry, multidisciplinary
What problem does this paper attempt to address?
The paper aims to address the issue of predicting the toxicity of chemical substances. Specifically, the researchers have developed a machine learning tool named MolToxPred to predict the potential toxicity of small molecules and metabolites. In traditional drug development processes, toxicological screening is time-consuming, costly, and inefficient, leading to a decrease in the number of new drugs entering the market. Additionally, animal experiments are controversial in terms of ethics and accuracy. Therefore, MolToxPred leverages machine learning technology to improve the accuracy and efficiency of toxicity prediction, reduce experimental costs and time, and minimize reliance on animal testing. MolToxPred employs a stacking model approach, combining Random Forest, Multilayer Perceptron, and LightGBM as base classifiers, with Logistic Regression as the meta-classifier. Through a comprehensive feature selection process and Bayesian optimization for hyperparameter tuning, it has been validated on multiple datasets. The study results show that MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on the external validation set, outperforming existing tools and base classifiers. In summary, the main goal of this paper is to develop an efficient and accurate machine learning tool for predicting the toxicity of small molecule candidates, thereby assisting drug discovery and regulatory processes in the pharmaceutical and other industries.